SparkSW: Scalable Distributed Computing System for Large-Scale Biological Sequence Alignment

Author(s):  
Guoguang Zhao ◽  
Cheng Ling ◽  
Donghong Sun
Author(s):  
Steve Sawyer ◽  
William Gibbons

This teaching case describes the efforts of one department in a large organization to migrate from an internally developed, mainframe-based, computing system to a system based on purchased software running on a client/server architecture. The case highlights issues with large scale software implementations such as those demanded by enterprise resource package (ERP) installations. Often, the ERP selected by an organization does not have all the required functionality. This demands purchasing and installing additional packages (known colloquially as “bolt-ons”) to provide the needed functionality. These implementations lead to issues regarding oversight of the technical architecture, both project and technology governance, and user department capability for managing the installation of new systems.


2010 ◽  
Vol 34-35 ◽  
pp. 1911-1915
Author(s):  
Jun Tang

Because the web is not only the platform for information exchange but also the computational platform based on JavaScript engine, every computer having installed modern browser on the Internet can easily access the web and execute some JavaScript programs. Under above conditions, we develop a lightweight distributed computing system based on the web and JavaScript technologies. Our system plays an intermediary role between the IT expert who has to solve large-scale computational problem and end users on the Internet. In the other words, people could easily cooperate with each other to finish complicated computational problem through the support of our system.


Author(s):  
Steve Sawyer ◽  
William Gibbons

This teaching case describes the efforts of one department in a large organization to migrate from an internally developed, mainframe-based, computing system to a system based on purchased software running on a client/server architecture. The case highlights issues with large scale software implementations such as those demanded by enterprise resource package (ERP) installations. Often, the ERP selected by an organization does not have all the required functionality. This demands purchasing and installing additional packages (known colloquially as bolt-ons) to provide the needed functionality. These implementations lead to issues regarding oversight of the technical architecture, both project and technology governance, and user department capability for managing the installation of new systems.


2021 ◽  
Vol 251 ◽  
pp. 02038
Author(s):  
Lene Kristian Bryngemark ◽  
David Cameron ◽  
Valentina Dutta ◽  
Thomas Eichlersmith ◽  
Balazs Konya ◽  
...  

Particle physics experiments rely extensively on computing and data services, making e-infrastructure an integral part of the research collaboration. Constructing and operating distributed computing can however be challenging for a smaller-scale collaboration. The Light Dark Matter eXperiment (LDMX) is a planned small-scale accelerator-based experiment to search for dark matter in the sub-GeV mass region. Finalizing the design of the detector relies on Monte-Carlo simulation of expected physics processes. A distributed computing pilot project was proposed to better utilize available resources at the collaborating institutes, and to improve scalability and reproducibility. This paper outlines the chosen lightweight distributed solution, presenting requirements, the component integration steps, and the experiences using a pilot system for tests with large-scale simulations. The system leverages existing technologies wherever possible, minimizing the need for software development, and deploys only non-intrusive components at the participating sites. The pilot proved that integrating existing components can dramatically reduce the effort needed to build and operate a distributed e-infrastructure, making it attainable even for smaller research collaborations.


2016 ◽  
Vol 17 (S9) ◽  
Author(s):  
Haidong Lan ◽  
Yuandong Chan ◽  
Kai Xu ◽  
Bertil Schmidt ◽  
Shaoliang Peng ◽  
...  

2018 ◽  
Vol 2018 ◽  
pp. 1-10
Author(s):  
Fudong Liu ◽  
Zheng Shan ◽  
Yihang Chen

Nonnegative matrix factorization (NMF) decomposes a high-dimensional nonnegative matrix into the product of two reduced dimensional nonnegative matrices. However, conventional NMF neither qualifies large-scale datasets as it maintains all data in memory nor preserves the geometrical structure of data which is needed in some practical tasks. In this paper, we propose a parallel NMF with manifold regularization method (PNMF-M) to overcome the aforementioned deficiencies by parallelizing the manifold regularized NMF on distributed computing system. In particular, PNMF-M distributes both data samples and factor matrices to multiple computing nodes instead of loading the whole dataset in a single node and updates both factor matrices locally on each node. In this way, PNMF-M succeeds to resolve the pressure of memory consumption for large-scale datasets and to speed up the computation by parallelization. For constructing the adjacency matrix in manifold regularization, we propose a two-step distributed graph construction method, which is proved to be equivalent to the batch construction method. Experimental results on popular text corpora and image datasets demonstrate that PNMF-M significantly improves both scalability and time efficiency of conventional NMF thanks to the parallelization on distributed computing system; meanwhile it significantly enhances the representation ability of conventional NMF thanks to the incorporated manifold regularization.


2016 ◽  
Vol 26 (04) ◽  
pp. 1750066 ◽  
Author(s):  
Lamiche Chaabane ◽  
Moussaoui Abdelouahab

One of the most essential operations in biological sequence analysis is multiple sequence alignment (MSA), where it is used for constructing evolutionary trees for DNA sequences and for analyzing the protein structures to help design new proteins. In this research study, a new method for solving sequence alignment problem is proposed, which is named improved tabu search (ITS). This algorithm is based on the classical tabu search (TS) optimizing technique. ITS is implemented in order to obtain results of multiple sequence alignment. Several variants concerning neighborhood generation and intensification/diversification strategies for our proposed ITS are investigated. Simulation results on a large scale of datasets have shown the efficacy of the developed approach and its capacity to achieve good quality solutions in terms of scores comparing to those given by other existing methods.


Sign in / Sign up

Export Citation Format

Share Document