scholarly journals An Intelligent Optimization Algorithm for Constructing a DNA Storage Code: NOL-HHO

2020 ◽  
Vol 21 (6) ◽  
pp. 2191 ◽  
Author(s):  
Qiang Yin ◽  
Ben Cao ◽  
Xue Li ◽  
Bin Wang ◽  
Qiang Zhang ◽  
...  

The high density, large capacity, and long-term stability of DNA molecules make them an emerging storage medium that is especially suitable for the long-term storage of large datasets. The DNA sequences used in storage need to consider relevant constraints to avoid nonspecific hybridization reactions, such as the No-runlength constraint, GC-content, and the Hamming distance. In this work, a new nonlinear control parameter strategy and a random opposition-based learning strategy were used to improve the Harris hawks optimization algorithm (for the improved algorithm NOL-HHO) in order to prevent it from falling into local optima. Experimental testing was performed on 23 widely used benchmark functions, and the proposed algorithm was used to obtain better coding lower bounds for DNA storage. The results show that our algorithm can better maintain a smooth transition between exploration and exploitation and has stronger global exploration capabilities as compared with other algorithms. At the same time, the improvement of the lower bound directly affects the storage capacity and code rate, which promotes the further development of DNA storage technology.

2020 ◽  
Author(s):  
Shadi Zabad ◽  
Alan M Moses

AbstractWe study the evolution of quantitative molecular traits in the absence of selection. Using a simple theory based on Felsenstein’s 1981 DNA substitution model, we predict a linear restoring force on the mean of an additive phenotype. Remarkably, the mean dynamics are independent of the effect sizes and genotype and are similar to the widely-used OU model for stabilizing selection. We confirm the predictions empirically using additive molecular phenotypes calculated from ancestral reconstructions of putatively unconstrained DNA sequences in primate genomes. We show that the OU model is favoured by inference software even when applied to GC content of unconstrained sequences or simulations of DNA evolution. We predict and confirm empirically that the dynamics of the variance are more complicated than those predicted by the OU model, and show that our results for the restoring force of mutation hold even for non-additive phenotypes, such as number of transcription factor binding sites, longest encoded peptide and folding propensity of the encoded peptide. Our results have implications for efforts to infer selection based on quantitative phenotype dynamics as well as to understand long-term trends in evolution of quantitative molecular traits.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0255376
Author(s):  
Li Xiaoru ◽  
Guo Ling

The development of information technology has produced massive amounts of data, which has brought severe challenges to information storage. Traditional electronic storage media cannot keep up with the ever-increasing demand for data storage, but in its place DNA has emerged as a feasible storage medium with high density, large storage capacity and strong durability. In DNA data storage, many different approaches can be used to encode data into codewords. DNA coding is a key step in DNA storage and can directly affect storage performance and data integrity. However, since errors are prone to occur in DNA synthesis and sequencing, and non-specific hybridization is prone to occur in the solution, how to effectively encode DNA has become an urgent problem to be solved. In this article, we propose a DNA storage coding method based on the equilibrium optimization random search (EORS) algorithm, which meets the Hamming distance, GC content and no-runlength constraints and can reduce the error rate in storage. Simulation experiments have shown that the size of the DNA storage code set constructed by the EORS algorithm that meets the combination constraints has increased by an average of 11% compared with previous work. The increase in the code set means that shorter DNA chains can be used to store more data.


Author(s):  
Allegra Angeloni ◽  
Ozren Bogdanovic

In vertebrates, cytosine-guanine (CpG) dinucleotides are predominantly methylated, with ∼80% of all CpG sites containing 5-methylcytosine (5mC), a repressive mark associated with long-term gene silencing. The exceptions to such a globally hypermethylated state are CpG-rich DNA sequences called CpG islands (CGIs), which are mostly hypomethylated relative to the bulk genome. CGIs overlap promoters from the earliest vertebrates to humans, indicating a concerted evolutionary drive compatible with CGI retention. CGIs are characterised by DNA sequence features that include DNA hypomethylation, elevated CpG and GC content and the presence of transcription factor binding sites. These sequence characteristics are congruous with the recruitment of transcription factors and chromatin modifying enzymes, and transcriptional activation in general. CGIs colocalize with sites of transcriptional initiation in hypermethylated vertebrate genomes, however, a growing body of evidence indicates that CGIs might exert their gene regulatory function in other genomic contexts. In this review, we discuss the diverse regulatory features of CGIs, their functional readout, and the evolutionary implications associated with CGI retention in vertebrates and possibly in invertebrates.


2019 ◽  
Vol 19 (2) ◽  
pp. 139-145 ◽  
Author(s):  
Bote Lv ◽  
Juan Chen ◽  
Boyan Liu ◽  
Cuiying Dong

<P>Introduction: It is well-known that the biogeography-based optimization (BBO) algorithm lacks searching power in some circumstances. </P><P> Material & Methods: In order to address this issue, an adaptive opposition-based biogeography-based optimization algorithm (AO-BBO) is proposed. Based on the BBO algorithm and opposite learning strategy, this algorithm chooses different opposite learning probabilities for each individual according to the habitat suitability index (HSI), so as to avoid elite individuals from returning to local optimal solution. Meanwhile, the proposed method is tested in 9 benchmark functions respectively. </P><P> Result: The results show that the improved AO-BBO algorithm can improve the population diversity better and enhance the search ability of the global optimal solution. The global exploration capability, convergence rate and convergence accuracy have been significantly improved. Eventually, the algorithm is applied to the parameter optimization of soft-sensing model in plant medicine extraction rate. Conclusion: The simulation results show that the model obtained by this method has higher prediction accuracy and generalization ability.</P>


Author(s):  
Prachi Agrawal ◽  
Talari Ganesh ◽  
Ali Wagdy Mohamed

AbstractThis article proposes a novel binary version of recently developed Gaining Sharing knowledge-based optimization algorithm (GSK) to solve binary optimization problems. GSK algorithm is based on the concept of how humans acquire and share knowledge during their life span. A binary version of GSK named novel binary Gaining Sharing knowledge-based optimization algorithm (NBGSK) depends on mainly two binary stages: binary junior gaining sharing stage and binary senior gaining sharing stage with knowledge factor 1. These two stages enable NBGSK for exploring and exploitation of the search space efficiently and effectively to solve problems in binary space. Moreover, to enhance the performance of NBGSK and prevent the solutions from trapping into local optima, NBGSK with population size reduction (PR-NBGSK) is introduced. It decreases the population size gradually with a linear function. The proposed NBGSK and PR-NBGSK applied to set of knapsack instances with small and large dimensions, which shows that NBGSK and PR-NBGSK are more efficient and effective in terms of convergence, robustness, and accuracy.


Author(s):  
Antonina Diachenko ◽  
Yilia Palamarchuk ◽  
Mykola Maievsky ◽  
Serhii Ilchenko ◽  
Eduard Syvokhop ◽  
...  

According to the Results of monitoring of Internet resources, analysis of Scientific-Methodical, Special and reference literature, members of the Research Group established, that the issue of implementation (determination of effectiveness) of Modern Scientific tools, as well as Technical means of Training that provide a System of long-term training of athletes specializing in Martial Arts – is devoted to an insufficient number of Scientific and Methodological works. This requires Further Scientific Research and emphasizes the relevance and Practical component of the chosen area of Research. The main Purpose of Scientific Research is to determine the effectiveness of modern Scientific tools ("VKS Katsumoto" and "Visual 3D") in the System of long-term training of wrestlers (on the example of athletes who specialize in Sambo wrestling). In the process of Research and Analytical work, the following Research methods were used: abstraction, Analysis and Synthesis, induction and deduction, modeling, mathematical and statistical, etc. As a Result of Empirical Research, the effectiveness of modern Scientific tools has been determined "VKS Katsumoto" and "Visual 3D" in the System of long-term training of athletes specializing in Sambo (Sports and Combat direction). Prospects for Further Scientific Research in the chosen direction of Research include a comparative analysis of the performance of Ukrainian sambo wrestlers at the 2021 World Cup using modern Scientific tools (Technical Teaching Aids).


2021 ◽  
Vol 16 ◽  
Author(s):  
Ruiheng Li ◽  
Qiong Zhuang ◽  
Nian Yu ◽  
Ruiyou Li ◽  
Huaiqing Zhang

Background: Recently, particle swarm optimization (PSO) has been increasingly used in geophysics due to its simple operation and fast convergence. Objective: However, PSO lacks population diversity and may fall to local optima. Hence, an improved hybrid particle swarm optimizer with sine-cosine acceleration coefficients (IH-PSO-SCAC) is proposed and successfully applied to test functions and in transient electromagnetic (TEM) nonlinear inversion. Method: A reverse learning strategy is applied to optimize population initialization. The sine-cosine acceleration coefficients are utilized for global convergence. Sine mapping is adopted to enhance population diversity during the search process. In addition, the mutation method is used to reduce the probability of premature convergence. Results: The application of IH-PSO-SCAC in the test functions and several simple layered models are demonstrated with satisfactory results in terms of data fit. Two inversions have been carried out to test our algorithm. The first model contains an underground low-resistivity anomaly body and the second model utilized measured data from a profile of the Xishan landslide in Sichuan Province. In both cases, resistivity profiles are obtained, and the inverse problem is solved for verification. Conclusion: The results show that the IH-PSO-SCAC algorithm is practical, can be effectively applied in TEM inversion and is superior to other representative algorithms in terms of stability and accuracy.


Author(s):  
Jaeho Jeong ◽  
Seong-Joon Park ◽  
Jae-Won Kim ◽  
Jong-Seon No ◽  
Ha Hyeon Jeon ◽  
...  

Abstract Motivation In DNA storage systems, there are tradeoffs between writing and reading costs. Increasing the code rate of error-correcting codes may save writing cost, but it will need more sequence reads for data retrieval. There is potentially a way to improve sequencing and decoding processes in such a way that the reading cost induced by this tradeoff is reduced without increasing the writing cost. In past researches, clustering, alignment, and decoding processes were considered as separate stages but we believe that using the information from all these processes together may improve decoding performance. Actual experiments of DNA synthesis and sequencing should be performed because simulations cannot be relied on to cover all error possibilities in practical circumstances. Results For DNA storage systems using fountain code and Reed-Solomon (RS) code, we introduce several techniques to improve the decoding performance. We designed the decoding process focusing on the cooperation of key components: Hamming-distance based clustering, discarding of abnormal sequence reads, RS error correction as well as detection, and quality score-based ordering of sequences. We synthesized 513.6KB data into DNA oligo pools and sequenced this data successfully with Illumina MiSeq instrument. Compared to Erlich’s research, the proposed decoding method additionally incorporates sequence reads with minor errors which had been discarded before, and thuswas able to make use of 10.6–11.9% more sequence reads from the same sequencing environment, this resulted in 6.5–8.9% reduction in the reading cost. Channel characteristics including sequence coverage and read-length distributions are provided as well. Availability The raw data files and the source codes of our experiments are available at: https://github.com/jhjeong0702/dna-storage.


2021 ◽  
Author(s):  
Amit Kumar ◽  
Malyaj R Prajapati ◽  
Surendra Upadhyay ◽  
Anamika Bhordia ◽  
Vinod Kumar Singh ◽  
...  

Abstract The present report communicates the first complete genome sequence of Brucella abortus 2308 strain isolated from a an abortion storm in a dairy farm located at Kanpur, Uttar Pradesh in India. It caused the last trimester abortions of 32 animals out of 100 cows in a dairy over a period of 60 days. The bacteria were isolated in pure culture from the placenta of aborted cows. The genome sequence length of isolated bacteria is 3,285,606 bp with a 57.25 % GC content, an N50 value of 296,426, L50 value of 4 containing 3,119 coding DNA sequences (CDSs), 49 tRNAs, 1 transfer messenger RNA (mRNA), and 3 rRNA genes. It is the first report of Brucella abortus 2308 isolation and complete genome sequence from Indian subcontinent.


Sign in / Sign up

Export Citation Format

Share Document