An Efficient Data Retrieval Parallel Reeb Graph Algorithm

Mustafa Hajij; Paul Rosen

doi:10.3390/a13100258

An Efficient Data Retrieval Parallel Reeb Graph Algorithm

Algorithms ◽

10.3390/a13100258 ◽

2020 ◽

Vol 13 (10) ◽

pp. 258

Author(s):

Mustafa Hajij ◽

Paul Rosen

Keyword(s):

Large Data ◽

Graph Algorithm ◽

Data Retrieval ◽

Scalar Function ◽

Large Data Sets ◽

Data Sets ◽

Reeb Graph ◽

The Past ◽

Segmentation Algorithms ◽

Efficient Data

The Reeb graph of a scalar function that is defined on a domain gives a topologically meaningful summary of that domain. Reeb graphs have been shown in the past decade to be of great importance in geometric processing, image processing, computer graphics, and computational topology. The demand for analyzing large data sets has increased in the last decade. Hence, the parallelization of topological computations needs to be more fully considered. We propose a parallel augmented Reeb graph algorithm on triangulated meshes with and without a boundary. That is, in addition to our parallel algorithm for computing a Reeb graph, we describe a method for extracting the original manifold data from the Reeb graph structure. We demonstrate the running time of our algorithm on standard datasets. As an application, we show how our algorithm can be utilized in mesh segmentation algorithms.

Download Full-text

Not all written in stone: interdisciplinary syntheses in echinoderm paleontology

Canadian Journal of Zoology ◽

10.1139/z00-217 ◽

2001 ◽

Vol 79 (7) ◽

pp. 1209-1231 ◽

Cited By ~ 16

Author(s):

Rich Mooi

Keyword(s):

Evolutionary History ◽

Large Scale ◽

Body Wall ◽

Developmental Trajectories ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Primary Target ◽

The Past ◽

Broad Scale

The fossil record of the Echinodermata is relatively complete, and is represented by specimens retaining an abundance of features comparable to that found in extant forms. This yields a half-billion-year record of evolutionary novelties unmatched in any other major group, making the Echinodermata a primary target for studies of biological change. Not all of this change can be understood by studying the rocks alone, leading to synthetic research programs. Study of literature from the past 20 years indicates that over 1400 papers on echinoderm paleontology appeared in that time, and that overall productivity has remained almost constant. Analysis of papers appearing since 1990 shows that research is driven by new finds including, but not restricted to, possible Precambrian echinoderms, bizarre new edrioasteroids, early crinoids, exquisitely preserved homalozoans, echinoids at the K-T boundary, and Antarctic echinoids, stelleroids, and crinoids. New interpretations of echinoderm body wall homologies, broad-scale syntheses of embryological information, the study of developmental trajectories through molecular markers, and the large-scale ecological and phenotypic shifts being explored through morphometry and analyses of large data sets are integrated with study of the fossils themselves. Therefore, recent advances reveal a remarkable and continuing synergistic expansion in our understanding of echinoderm evolutionary history.

Download Full-text

Living standards in the past: new perspectives on well-being in Asia and Europe By Robert C. Allen, Tommy Bengtsson and Martin Dribe, eds. Oxford: Oxford University Press, 2005. Pp. xxii + 472. ISBN 0-19-928068-1

Journal of Global History ◽

10.1017/s1740022806000179 ◽

2006 ◽

Vol 1 (2) ◽

pp. 289-292 ◽

Cited By ~ 2

Author(s):

Anne Booth

Keyword(s):

Labour Force ◽

Large Data ◽

Well Being ◽

Living Standards ◽

Large Data Sets ◽

Data Sets ◽

Household Surveys ◽

Absolute Poverty ◽

Oxford University ◽

The Past

We live in an age of increasingly abundant statistical information. The advent of more large data sets obtained from household surveys, as well as from population censuses, labour force surveys, economic censuses and so on, has facilitated reasonably accurate estimates of income and expenditures for households in many parts of the world. These estimates can in turn be used to estimate a number of distributional indicators, as well as estimates of relative and absolute poverty. In addition better census coverage has permitted estimates of infant and child mortality rates, life expectancies, literacy rates and indicators of educational attainment. Such data have in turn been used to estimate composite indicators of wellbeing such as the Human Development Index, not just for entire countries but often for regions within countries as well.

Download Full-text

A Review of Data Mining Techniques and Applications

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2017.p0031 ◽

2017 ◽

Vol 21 (1) ◽

pp. 31-48 ◽

Cited By ~ 2

Author(s):

Ratchakoon Pruengkarn ◽

◽

Kok Wai Wong ◽

Chun Che Fung

Keyword(s):

Data Mining ◽

Health Informatics ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Huge Number ◽

Data Mining Technique ◽

Data Mining Techniques ◽

The Past ◽

Number Of Publications

Data mining is the analytics and knowledge discovery process of analyzing large volumes of data from various sources and transforming the data into useful information. Various disciplines have contributed to its development and is becoming increasingly important in the scientific and industrial world. This article presents a review of data mining techniques and applications from 1996 to 2016. Techniques are divided into two main categories: predictive methods and descriptive methods. Due to the huge number of publications available on this topic, only a selected number are used in this review to highlight the developments of the past 20 years. Applications are included to provide some insights into how each data mining technique has evolved over the last two decades. Recent research trends focus more on large data sets and big data. Recently there have also been more applications in area of health informatics with the advent of newer algorithms.

Download Full-text

plantR: An R package and workflow for managing species records from biological collections

10.1101/2021.04.06.437754 ◽

2021 ◽

Author(s):

Renato Augusto Ferreira Lima ◽

Andrea Sanchez-Tapia ◽

Sara R. Mortara ◽

Hans Steege ◽

Marinez F. Siqueira

Keyword(s):

Biodiversity Conservation ◽

Large Data ◽

Data Retrieval ◽

R Package ◽

Large Data Sets ◽

Data Sets ◽

Data Repositories ◽

Biological Collections ◽

Data Editing ◽

User Friendly

Species records from biological collections are becoming increasingly available online. This unprecedented availability of records has largely supported recent studies in taxonomy, bio-geography, macro-ecology, and biodiversity conservation. Biological collections vary in their documentation and notation standards, which have changed through time. For different reasons, neither collections nor data repositories perform the editing, formatting and standardization of the data, leaving these tasks to the final users of the species records (e.g. taxonomists, ecologists and conservationists). These tasks are challenging, particularly when working with millions of records from hundreds of biological collections. To help collection curators and final users to perform those tasks, we introduce plantR an open-source package that provides a comprehensive toolbox to manage species records from biological collections. The package is accompanied by the proposal of a reproducible workflow to manage this type of data in taxonomy, ecology and biodiversity conservation. It is implemented in R and designed to handle relatively large data sets as fast as possible. Initially designed to handle plant species records, many of the plantR features also apply to other groups of organisms, given that the data structure is similar. The plantR workflow includes tools to (1) download records from different data repositories, (2) standardize typical fields associated with species records, (3) validate the locality, geographical coordinates, taxonomic nomenclature and species identifications, including the retrieval of duplicates across collections, and (4) summarize and export records, including the construction of species checklists with vouchers. Other R packages provide tools to tackle some of the workflow steps described above. But in addition to the new features and resources related to the data editing and validation, the greatest strength of plantR is to provide a comprehensive and user-friendly workflow in one single environment, performing all tasks from data retrieval to export. Thus, plantR can help researchers to better assess data quality and avoid data leakage in a wide variety of studies using species records.

Download Full-text

GENE2D: A NoSQL Integrated Data Repository of Genetic Disorders Data

Healthcare ◽

10.3390/healthcare8030257 ◽

2020 ◽

Vol 8 (3) ◽

pp. 257

Author(s):

Halima Samra ◽

Alice Li ◽

Ben Soh

Keyword(s):

Saudi Arabia ◽

Genetic Disorders ◽

Large Data ◽

Data Retrieval ◽

Large Data Sets ◽

Data Repository ◽

Data Sets ◽

Integration Framework ◽

Database Program ◽

Multiple Services

There are few sources from which to obtain clinical and genetic data for use in research in Saudi Arabia. Numerous obstacles led to the difficulty of integrating these data from silos and scattered sources to provide standardized access to large data sets for patients with common health conditions. To this end, we sought to contribute to this area and offer a practical and easy-to-implement solution. In this paper, we aim to design and implement a “not only SQL” (NoSQL) based integration framework to generate an Integrated Data Repository of Genetic Disorders Data (GENE2D) to integrate data from various genetic clinics and research centers in Saudi Arabia and provide an easy-to-use query interface for researchers to conduct their studies on large datasets. The major components involved in the GENE2D architecture consists of the data sources, the integrated data repository (IDR) as a central database, and the application interface. The IDR uses a NoSQL document store via MongoDB (an open source document-oriented database program) as a backend database. The application interface called Query Builder provides multiple services for data retrieval from the database using a custom query to answer simple or complex research questions. The GENE2D system demonstrates its potential to help grow and develop a national genetic disorders database in Saudi Arabia.

Download Full-text

Efficient Data Projection for Visual Analysis of Large Data Sets Using Neural Networks

Informatica ◽

10.15388/informatica.2011.339 ◽

2011 ◽

Vol 22 (4) ◽

pp. 507-520 ◽

Cited By ~ 9

Author(s):

Viktor Medvedev ◽

Gintautas Dzemyda ◽

Olga Kurasova ◽

Virginijus Marcinkevičius

Keyword(s):

Neural Networks ◽

Visual Analysis ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Efficient Data

Download Full-text

Clustering techniques for large data sets---from the past to the future

Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '99 ◽

10.1145/312179.312189 ◽

1999 ◽

Cited By ~ 11

Author(s):

Daniel A. Keim ◽

Alexander Hinneburg

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Clustering Techniques ◽

The Past ◽

The Future

Download Full-text

An example of spectrum imaging used for comparison of EELS quantitative analysis techniques on Al-Li

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010008794x ◽

1991 ◽

Vol 49 ◽

pp. 726-727

Author(s):

John A. Hunt

Keyword(s):

Quantitative Analysis ◽

Large Data ◽

Difference Spectrum ◽

Large Data Sets ◽

Foil Thickness ◽

Data Sets ◽

Analysis Techniques ◽

Spectrum Imaging ◽

Normal Spectrum ◽

Electron Energy Loss

Spectrum-imaging is a useful technique for comparing different processing methods on very large data sets which are identical for each method. This paper is concerned with comparing methods of electron energy-loss spectroscopy (EELS) quantitative analysis on the Al-Li system. The spectrum-image analyzed here was obtained from an Al-10at%Li foil aged to produce δ' precipitates that can span the foil thickness. Two 1024 channel EELS spectra offset in energy by 1 eV were recorded and stored at each pixel in the 80x80 spectrum-image (25 Mbytes). An energy range of 39-89eV (20 channels/eV) are represented. During processing the spectra are either subtracted to create an artifact corrected difference spectrum, or the energy offset is numerically removed and the spectra are added to create a normal spectrum. The spectrum-images are processed into 2D floating-point images using methods and software described in [1].

Download Full-text

Cluster analysis for large data sets: applications to individual aerosol particles from the mid-pacific

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100132078 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1488-1489

Author(s):

Thomas W. Shattuck ◽

James R. Anderson ◽

Neil W. Tindale ◽

Peter R. Buseck

Keyword(s):

Cluster Analysis ◽

Chemical Reactivity ◽

Large Data ◽

Large Data Sets ◽

Particle Analysis ◽

Data Sets ◽

Halogen Chemistry ◽

Complete Study ◽

Components Analysis ◽

Automated Scanning

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.

Download Full-text

Faculty Opinions recommendation of Detecting novel associations in large data sets.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13805958.793484294 ◽

2014 ◽

Author(s):

Daniel Lee

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Novel Associations

Download Full-text