scholarly journals An Efficient Data Retrieval Parallel Reeb Graph Algorithm

Algorithms ◽  
2020 ◽  
Vol 13 (10) ◽  
pp. 258
Author(s):  
Mustafa Hajij ◽  
Paul Rosen

The Reeb graph of a scalar function that is defined on a domain gives a topologically meaningful summary of that domain. Reeb graphs have been shown in the past decade to be of great importance in geometric processing, image processing, computer graphics, and computational topology. The demand for analyzing large data sets has increased in the last decade. Hence, the parallelization of topological computations needs to be more fully considered. We propose a parallel augmented Reeb graph algorithm on triangulated meshes with and without a boundary. That is, in addition to our parallel algorithm for computing a Reeb graph, we describe a method for extracting the original manifold data from the Reeb graph structure. We demonstrate the running time of our algorithm on standard datasets. As an application, we show how our algorithm can be utilized in mesh segmentation algorithms.

2001 ◽  
Vol 79 (7) ◽  
pp. 1209-1231 ◽  
Author(s):  
Rich Mooi

The fossil record of the Echinodermata is relatively complete, and is represented by specimens retaining an abundance of features comparable to that found in extant forms. This yields a half-billion-year record of evolutionary novelties unmatched in any other major group, making the Echinodermata a primary target for studies of biological change. Not all of this change can be understood by studying the rocks alone, leading to synthetic research programs. Study of literature from the past 20 years indicates that over 1400 papers on echinoderm paleontology appeared in that time, and that overall productivity has remained almost constant. Analysis of papers appearing since 1990 shows that research is driven by new finds including, but not restricted to, possible Precambrian echinoderms, bizarre new edrioasteroids, early crinoids, exquisitely preserved homalozoans, echinoids at the K-T boundary, and Antarctic echinoids, stelleroids, and crinoids. New interpretations of echinoderm body wall homologies, broad-scale syntheses of embryological information, the study of developmental trajectories through molecular markers, and the large-scale ecological and phenotypic shifts being explored through morphometry and analyses of large data sets are integrated with study of the fossils themselves. Therefore, recent advances reveal a remarkable and continuing synergistic expansion in our understanding of echinoderm evolutionary history.


2006 ◽  
Vol 1 (2) ◽  
pp. 289-292 ◽  
Author(s):  
Anne Booth

We live in an age of increasingly abundant statistical information. The advent of more large data sets obtained from household surveys, as well as from population censuses, labour force surveys, economic censuses and so on, has facilitated reasonably accurate estimates of income and expenditures for households in many parts of the world. These estimates can in turn be used to estimate a number of distributional indicators, as well as estimates of relative and absolute poverty. In addition better census coverage has permitted estimates of infant and child mortality rates, life expectancies, literacy rates and indicators of educational attainment. Such data have in turn been used to estimate composite indicators of wellbeing such as the Human Development Index, not just for entire countries but often for regions within countries as well.


Author(s):  
Ratchakoon Pruengkarn ◽  
◽  
Kok Wai Wong ◽  
Chun Che Fung

Data mining is the analytics and knowledge discovery process of analyzing large volumes of data from various sources and transforming the data into useful information. Various disciplines have contributed to its development and is becoming increasingly important in the scientific and industrial world. This article presents a review of data mining techniques and applications from 1996 to 2016. Techniques are divided into two main categories: predictive methods and descriptive methods. Due to the huge number of publications available on this topic, only a selected number are used in this review to highlight the developments of the past 20 years. Applications are included to provide some insights into how each data mining technique has evolved over the last two decades. Recent research trends focus more on large data sets and big data. Recently there have also been more applications in area of health informatics with the advent of newer algorithms.


2021 ◽  
Author(s):  
Renato Augusto Ferreira Lima ◽  
Andrea Sanchez-Tapia ◽  
Sara R. Mortara ◽  
Hans Steege ◽  
Marinez F. Siqueira

Species records from biological collections are becoming increasingly available online. This unprecedented availability of records has largely supported recent studies in taxonomy, bio-geography, macro-ecology, and biodiversity conservation. Biological collections vary in their documentation and notation standards, which have changed through time. For different reasons, neither collections nor data repositories perform the editing, formatting and standardization of the data, leaving these tasks to the final users of the species records (e.g. taxonomists, ecologists and conservationists). These tasks are challenging, particularly when working with millions of records from hundreds of biological collections. To help collection curators and final users to perform those tasks, we introduce plantR an open-source package that provides a comprehensive toolbox to manage species records from biological collections. The package is accompanied by the proposal of a reproducible workflow to manage this type of data in taxonomy, ecology and biodiversity conservation. It is implemented in R and designed to handle relatively large data sets as fast as possible. Initially designed to handle plant species records, many of the plantR features also apply to other groups of organisms, given that the data structure is similar. The plantR workflow includes tools to (1) download records from different data repositories, (2) standardize typical fields associated with species records, (3) validate the locality, geographical coordinates, taxonomic nomenclature and species identifications, including the retrieval of duplicates across collections, and (4) summarize and export records, including the construction of species checklists with vouchers. Other R packages provide tools to tackle some of the workflow steps described above. But in addition to the new features and resources related to the data editing and validation, the greatest strength of plantR is to provide a comprehensive and user-friendly workflow in one single environment, performing all tasks from data retrieval to export. Thus, plantR can help researchers to better assess data quality and avoid data leakage in a wide variety of studies using species records.


Healthcare ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 257
Author(s):  
Halima Samra ◽  
Alice Li ◽  
Ben Soh

There are few sources from which to obtain clinical and genetic data for use in research in Saudi Arabia. Numerous obstacles led to the difficulty of integrating these data from silos and scattered sources to provide standardized access to large data sets for patients with common health conditions. To this end, we sought to contribute to this area and offer a practical and easy-to-implement solution. In this paper, we aim to design and implement a “not only SQL” (NoSQL) based integration framework to generate an Integrated Data Repository of Genetic Disorders Data (GENE2D) to integrate data from various genetic clinics and research centers in Saudi Arabia and provide an easy-to-use query interface for researchers to conduct their studies on large datasets. The major components involved in the GENE2D architecture consists of the data sources, the integrated data repository (IDR) as a central database, and the application interface. The IDR uses a NoSQL document store via MongoDB (an open source document-oriented database program) as a backend database. The application interface called Query Builder provides multiple services for data retrieval from the database using a custom query to answer simple or complex research questions. The GENE2D system demonstrates its potential to help grow and develop a national genetic disorders database in Saudi Arabia.


Informatica ◽  
2011 ◽  
Vol 22 (4) ◽  
pp. 507-520 ◽  
Author(s):  
Viktor Medvedev ◽  
Gintautas Dzemyda ◽  
Olga Kurasova ◽  
Virginijus Marcinkevičius

Author(s):  
John A. Hunt

Spectrum-imaging is a useful technique for comparing different processing methods on very large data sets which are identical for each method. This paper is concerned with comparing methods of electron energy-loss spectroscopy (EELS) quantitative analysis on the Al-Li system. The spectrum-image analyzed here was obtained from an Al-10at%Li foil aged to produce δ' precipitates that can span the foil thickness. Two 1024 channel EELS spectra offset in energy by 1 eV were recorded and stored at each pixel in the 80x80 spectrum-image (25 Mbytes). An energy range of 39-89eV (20 channels/eV) are represented. During processing the spectra are either subtracted to create an artifact corrected difference spectrum, or the energy offset is numerically removed and the spectra are added to create a normal spectrum. The spectrum-images are processed into 2D floating-point images using methods and software described in [1].


Author(s):  
Thomas W. Shattuck ◽  
James R. Anderson ◽  
Neil W. Tindale ◽  
Peter R. Buseck

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.


Sign in / Sign up

Export Citation Format

Share Document