scholarly journals SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+

2013 ◽  
Vol 2013 ◽  
pp. 1-12 ◽  
Author(s):  
Darío Guerrero-Fernández ◽  
Juan Falgueras ◽  
M. Gonzalo Claros

Current genomic analyses often require the managing and comparison of big data using desktop bioinformatic software that was not developed regarding multicore distribution. The task-farm SCBI_MAPREDUCE is intended to simplify the trivial parallelisation and distribution of new and legacy software and scripts for biologists who are interested in using computers but are not skilled programmers. In the case of legacy applications, there is no need of modification or rewriting the source code. It can be used from multicore workstations to heterogeneous grids. Tests have demonstrated that speed-up scales almost linearly and that distribution in small chunks increases it. It is also shown that SCBI_MAPREDUCE takes advantage of shared storage when necessary, is fault-tolerant, allows for resuming aborted jobs, does not need special hardware or virtual machine support, and provides the same results than a parallelised, legacy software. The same is true for interrupted and relaunched jobs. As proof-of-concept, distribution of a compiled version of BLAST+ in the SCBI_DISTRIBUTED_BLAST gem is given, indicating that other blast binaries can be used while maintaining the same SCBI_DISTRIBUTED_BLAST code. Therefore, SCBI_MAPREDUCE suits most parallelisation and distribution needs in, for example, gene and genome studies.

Author(s):  
A. G. Jackson ◽  
M. Rowe

Diffraction intensities from intermetallic compounds are, in the kinematic approximation, proportional to the scattering amplitude from the element doing the scattering. More detailed calculations have shown that site symmetry and occupation by various atom species also affects the intensity in a diffracted beam. [1] Hence, by measuring the intensities of beams, or their ratios, the occupancy can be estimated. Measurement of the intensity values also allows structure calculations to be made to determine the spatial distribution of the potentials doing the scattering. Thermal effects are also present as a background contribution. Inelastic effects such as loss or absorption/excitation complicate the intensity behavior, and dynamical theory is required to estimate the intensity value.The dynamic range of currents in diffracted beams can be 104or 105:1. Hence, detection of such information requires a means for collecting the intensity over a signal-to-noise range beyond that obtainable with a single film plate, which has a S/N of about 103:1. Although such a collection system is not available currently, a simple system consisting of instrumentation on an existing STEM can be used as a proof of concept which has a S/N of about 255:1, limited by the 8 bit pixel attributes used in the electronics. Use of 24 bit pixel attributes would easily allowthe desired noise range to be attained in the processing instrumentation. The S/N of the scintillator used by the photoelectron sensor is about 106 to 1, well beyond the S/N goal. The trade-off that must be made is the time for acquiring the signal, since the pattern can be obtained in seconds using film plates, compared to 10 to 20 minutes for a pattern to be acquired using the digital scan. Parallel acquisition would, of course, speed up this process immensely.


Author(s):  
Holger Gruen ◽  
Carsten Benthin ◽  
Sven Woop

We propose an easy and simple-to-integrate approach to accelerate ray tracing of alpha-tested transparent geometry with a focus on Microsoft® DirectX® or Vulkan® ray tracing extensions. Pre-computed bit masks are used to quickly determine fully transparent and fully opaque regions of triangles thereby skipping the more expensive alpha-test operation. These bit masks allow us to skip up to 86% of all transparency tests, yielding up to 40% speed up in a proof-of-concept DirectX® software only implementation.


Author(s):  
Sreenu G. ◽  
M.A. Saleem Durai

Advances in recent hardware technology have permitted to document transactions and other pieces of information of everyday life at an express pace. In addition of speed up and storage capacity, real-life perceptions tend to transform over time. However, there are so much prospective and highly functional values unseen in the vast volume of data. For this kind of applications conventional data mining is not suitable, so they should be tuned and changed or designed with new algorithms. Big data computing is inflowing to the category of most hopeful technologies that shows the way to new ways of thinking and decision making. This epoch of big data helps users to take benefit out of all available data to gain more precise systematic results or determine latent information, and then make best possible decisions. Depiction from a broad set of workloads, the author establishes a set of classifying measures based on the storage architecture, processing types, processing techniques and the tools and technologies used.


Big Data ◽  
2016 ◽  
pp. 1091-1109 ◽  
Author(s):  
Alba Amato ◽  
Salvatore Venticinque ◽  
Beniamino Di Martino

The digital revolution changes the way culture and places could be lived. It allows users to interact with the environment creating an immense availability of data, which can be used to better understand the behavior of visitors, as well as to learn about their thoughts on what the visit creates excitement or disappointment. In this context, Big Data becomes immensely important, making possible to turn this amount of data in information, knowledge, and, ultimately, wisdom. This paper aims at modeling and designing a scalable solution that integrates semantic techniques with Cloud and Big Data technologies to deliver context aware services in the application domain of the cultural heritage. The authors started from a baseline framework that originally was not conceived to scale when huge workloads, related to big data, must be processed. They provide an original formulation of the problem and an original software architecture that fulfills both functional and not-functional requirements. The authors present the technological stack and the implementation of a proof of concept.


Author(s):  
Lerina Aversano ◽  
Raffaele Esposito ◽  
Teresa Mallardo ◽  
Maria Tortorella

In e-business, addressing the technical issues alone is not enough to drive the evolution of existing legacy applications, but it is necessary to consider problems concerning the strict relationship that exists between the evolution of the legacy system and the evolution of the e-business process. To fulfill this purpose, this chapter proposes a strategy for extracting the requirements for a legacy system evolution from the requirements of the e-business process evolution. The strategy includes a toolkit composed of a set of decision tables and a measurement framework, both referring to the organization, business processes, and legacy software systems. The decision tables allow the identification of the processes to be evolved, the actions to be performed on them and their activities, and the strategies to be adopted for evolving the information systems. The measurement framework aims at achieving a greater understanding of the processes and related problems, taking into account organizational and technological issues.


Author(s):  
Honglong Xu ◽  
Haiwu Rong ◽  
Rui Mao ◽  
Guoliang Chen ◽  
Zhiguang Shan

Big data is profoundly changing the lifestyles of people around the world in an unprecedented way. Driven by the requirements of applications across many industries, research on big data has been growing. Methods to manage and analyze big data to extract valuable information are the key of big data research. Starting from the variety challenge of big data, this dissertation proposes a universal big data management and analysis framework based on metric space. In this framework, the Hilbert Index-based Outlier Detection (HIOD) algorithm is proposed. HIOD can handle all datatypes that can be abstracted to metric space and achieve higher detection speed. Experimental results indicate that HIOD can effectively overcome the variety challenge of big data and achieves a 2.02 speed up over iORCA on average and, in certain cases, up to 5.57. The distance calculation times are reduced by 47.57% on average and up to 89.10%.


2016 ◽  
Vol 13 (4) ◽  
pp. 19-35 ◽  
Author(s):  
Lídice García Ríos ◽  
José Alberto Incera Diéguez

Sensor networks have perceived an extraordinary growth in the last few years. From niche industrial and military applications, they are currently deployed in a wide range of settings as sensors are becoming smaller, cheaper and easier to use. Sensor networks are a key player in the so-called Internet of Things, generating exponentially increasing amounts of data. Nonetheless, there are very few documented works that tackle the challenges related with the collection, manipulation and exploitation of the data generated by these networks. This paper presents a proposal for integrating Big Data tools (in rest and in motion) for gathering, storage and analysis of data generated by a sensor network that monitors air pollution levels in a city. The authors provide a proof of concept that combines Hadoop and Storm for data processing, storage and analysis, and Arduino-based kits for constructing their sensor prototypes.


Sign in / Sign up

Export Citation Format

Share Document