Bitmap indexes for large scientific data sets: a case study

Parallel membership queries on very large scientific data sets using bitmap indexes

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.5157 ◽

2019 ◽

Vol 31 (15) ◽

pp. e5157 ◽

Cited By ~ 2

Author(s):

Beytullah Yildiz ◽

Kesheng Wu ◽

Suren Byna ◽

Arie Shoshani

Keyword(s):

Scientific Data ◽

Data Sets ◽

Membership Queries ◽

Bitmap Indexes

Download Full-text

Segment-Based Approach for Assessing Hazard Risk of Coastal Highways in Hawai‘i

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198118821679 ◽

2019 ◽

Vol 2673 (1) ◽

pp. 83-91 ◽

Cited By ~ 2

Author(s):

Harrison Togia ◽

Oceana P. Francis ◽

Karl Kim ◽

Guohui Zhang

Keyword(s):

Qualitative Method ◽

Data Sets ◽

Regional Environment ◽

The Road ◽

Hazard Exposure ◽

Multiple Indicators ◽

System Data ◽

Rural Highway ◽

Geographic Information System Data

Hazards to roadways and travelers can be drastically different because hazards are largely dependent on the regional environment and climate. This paper describes the development of a qualitative method for assessing infrastructure importance and hazard exposure for rural highway segments in Hawai‘i under different conditions. Multiple indicators of roadway importance are considered, including traffic volume, population served, accessibility, connectivity, reliability, land use, and roadway connection to critical infrastructures, such as hospitals and police stations. The method of evaluating roadway hazards and importance can be tailored to fit different regional hazard scenarios. It assimilates data from diverse sources to estimate risks of disruption. A case study for Highway HI83 in Hawai‘i, which is exposed to multiple hazards, is conducted. Weakening of the road by coastal erosion, inundation from sea level rise, and rockfall hazards require adaptation solutions. By analyzing the risk of disruption to highway segments, adaptation approaches can be prioritized. Using readily available geographic information system data sets for the exposure and impacts of potential hazards, this method could be adapted not only for emergency management but also for planning, design, and engineering of resilient highways.

Download Full-text

Ontology-Based Correlation Detection Among Heterogeneous Data Sets: A Case Study of University Campus Issues

2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) ◽

10.1109/aike48582.2020.00014 ◽

2020 ◽

Author(s):

Yuto Tsukagoshi ◽

Shusaku Egami ◽

Yuichi Sei ◽

Yasuyuki Tahara ◽

Akihiko Ohsuga

Keyword(s):

Heterogeneous Data ◽

Data Sets ◽

University Campus

Download Full-text

ECMO Therapy in Acute Chest Syndrome for Patients with Sickle Cell Disease: a Case Report and Literature Review

SN Comprehensive Clinical Medicine ◽

10.1007/s42399-021-00987-0 ◽

2021 ◽

Author(s):

Soi Avgeridou ◽

Ilija Djordjevic ◽

Anton Sabashnikov ◽

Kaveh Eghbalzadeh ◽

Laura Suhr ◽

...

Keyword(s):

Sickle Cell Disease ◽

Sickle Cell ◽

Scientific Data ◽

Acute Chest Syndrome ◽

Data Sets ◽

Limited Data ◽

Cell Disease ◽

Life Saving ◽

Institutional Experience ◽

Ecmo Therapy

AbstractExtracorporeal membrane oxygenation (ECMO) plays an important role as a life-saving tool for patients with therapy-refractory cardio-respiratory failure. Especially, for rare and infrequent indications, scientific data is scarce. The conducted paper focuses primarily on our institutional experience with a 19-year-old patient suffering an acute chest syndrome, a pathognomonic pulmonary condition presented by patients with sickle cell disease. After implementation of awake ECMO therapy, the patient was successfully weaned off support and discharged home 22 days after initiation of the extracorporeal circulation. In addition to limited data and current literature, further and larger data sets are necessary to determine the outcome after ECMO therapy for this rare indication.

Download Full-text

Narrative Scientific Data Visualization in an Immersive Environment

Bioinformatics ◽

10.1093/bioinformatics/btab052 ◽

2021 ◽

Author(s):

Richen Liu ◽

Hailong Wang ◽

Chuyu Zhang ◽

Xiaojian Chen ◽

Lijun Wang ◽

...

Keyword(s):

Data Visualization ◽

Domain Knowledge ◽

User Study ◽

Scientific Visualization ◽

Scientific Data ◽

Immersive Environments ◽

Related Information ◽

New Strategy ◽

Scientific Data Visualization

Abstract Motivation Narrative visualization for scientific data explorations can help users better understand the domain knowledge, because narrative visualizations often present a sequence of facts and observations linked together by a unifying theme or argument. Narrative visualization in immersive environments can provide users with an intuitive experience to interactively explore the scientific data, because immersive environments provide a brand new strategy for interactive scientific data visualization and exploration. However, it is challenging to develop narrative scientific visualization in immersive environments. In this paper, we propose an immersive narrative visualization tool to create and customize scientific data explorations for ordinary users with little knowledge about programming on scientific visualization, They are allowed to define POIs (point of interests) conveniently by the handler of an immersive device. Results Automatic exploration animations with narrative annotations can be generated by the gradual transitions between consecutive POI pairs. Besides, interactive slicing can be also controlled by device handler. Evaluations including user study and case study are designed and conducted to show the usability and effectiveness of the proposed tool. Availability Related information can be accessed at: https://dabigtou.github.io/richenliu/

Download Full-text

Queue Length Forecasting in Complex Manufacturing Job Shops

Forecasting ◽

10.3390/forecast3020021 ◽

2021 ◽

Vol 3 (2) ◽

pp. 322-338

Author(s):

Marvin Carl May ◽

Alexander Albers ◽

Marc David Fischer ◽

Florian Mayerhofer ◽

Louis Schäfer ◽

...

Keyword(s):

Operations Management ◽

Queue Length ◽

Manufacturing Systems ◽

Production Control ◽

Data Sets ◽

Job Shops ◽

Static Data ◽

Queue Lengths ◽

Operational Excellence

Currently, manufacturing is characterized by increasing complexity both on the technical and organizational levels. Thus, more complex and intelligent production control methods are developed in order to remain competitive and achieve operational excellence. Operations management described early on the influence among target metrics, such as queuing times, queue length, and production speed. However, accurate predictions of queue lengths have long been overlooked as a means to better understanding manufacturing systems. In order to provide queue length forecasts, this paper introduced a methodology to identify queue lengths in retrospect based on transitional data, as well as a comparison of easy-to-deploy machine learning-based queue forecasting models. Forecasting, based on static data sets, as well as time series models can be shown to be successfully applied in an exemplary semiconductor case study. The main findings concluded that accurate queue length prediction, even with minimal available data, is feasible by applying a variety of techniques, which can enable further research and predictions.

Download Full-text

Generalizing Treatment Effect Estimates From Sample to Population: A Case Study in the Difficulties of Finding Sufficient Data

Evaluation Review ◽

10.1177/0193841x16660663 ◽

2016 ◽

Vol 41 (4) ◽

pp. 357-388 ◽

Cited By ~ 13

Author(s):

Elizabeth A. Stuart ◽

Anna Rhodes

Keyword(s):

Head Start ◽

Randomized Trial ◽

Treatment Effect ◽

External Validity ◽

Randomized Trials ◽

Target Population ◽

Data Sets ◽

Common Data Elements ◽

Ex Post

Background: Given increasing concerns about the relevance of research to policy and practice, there is growing interest in assessing and enhancing the external validity of randomized trials: determining how useful a given randomized trial is for informing a policy question for a specific target population. Objectives: This article highlights recent advances in assessing and enhancing external validity, with a focus on the data needed to make ex post statistical adjustments to enhance the applicability of experimental findings to populations potentially different from their study sample. Research design: We use a case study to illustrate how to generalize treatment effect estimates from a randomized trial sample to a target population, in particular comparing the sample of children in a randomized trial of a supplemental program for Head Start centers (the Research-Based, Developmentally Informed study) to the national population of children eligible for Head Start, as represented in the Head Start Impact Study. Results: For this case study, common data elements between the trial sample and population were limited, making reliable generalization from the trial sample to the population challenging. Conclusions: To answer important questions about external validity, more publicly available data are needed. In addition, future studies should make an effort to collect measures similar to those in other data sets. Measure comparability between population data sets and randomized trials that use samples of convenience will greatly enhance the range of research and policy relevant questions that can be answered.

Download Full-text

Empirical Underidentification with the Bifactor Model: A Case Study

Educational and Psychological Measurement ◽

10.1177/0013164417719947 ◽

2017 ◽

Vol 78 (5) ◽

pp. 717-736 ◽

Cited By ~ 4

Author(s):

Samuel Green ◽

Yanyun Yang

Keyword(s):

Structural Equation Model ◽

Structural Equation ◽

Equation Model ◽

Bifactor Model ◽

Data Sets ◽

Bifactor Models

Bifactor models are commonly used to assess whether psychological and educational constructs underlie a set of measures. We consider empirical underidentification problems that are encountered when fitting particular types of bifactor models to certain types of data sets. The objective of the article was fourfold: (a) to allow readers to gain a better general understanding of issues surrounding empirical identification, (b) to offer insights into empirical underidentification with bifactor models, (c) to inform methodologists who explore bifactor models about empirical underidentification with these models, and (d) to propose strategies for structural equation model users to deal with underidentification problems that can emerge when applying bifactor models.

Download Full-text

Inflight icing data gathering during routine flight operations — a case study

The Aeronautical Journal ◽

10.1017/s0001924000017942 ◽

2001 ◽

Vol 105 (1051) ◽

pp. 501-516 ◽

Cited By ~ 3

Author(s):

A. P. Brown

Keyword(s):

Data Gathering ◽

Winter Season ◽

Leading Edge ◽

General Aviation ◽

Data Sets ◽

Large Droplet ◽

Event Data ◽

Flight Operations ◽

Lower Corner

Abstract For the purpose of the design and certification of inflight icing protection systems for transport and general aviation aircraft, the eventual re-definition/expansion of the icing environment of FAR 25/JAR 25, Appendix C is under consideration. Such a re-definition will be aided by gathering as much inflight icing event data as reasonably possible, from widely-different geographic locations. The results of a 12-month pilot programme of icing event data gathering are presented. Using non-instrumented turboprop aircraft flying upon mid-altitude routine air transport operations, the programme has gathered observational data from across the British Isles and central France. By observing a number of metrics, notably windscreen lower-corner ice impingement limits, against an opposing corner vortex-flow, supported by wing leading edge impingement limits, the observed icing events have been classified as ‘small’, ‘medium’ or ‘large’ droplet. Using the guidance of droplet trajectory modelling, MVD values for the three droplet size bins have been conjectured to be 15, 40 and 80mm. Hence, the ‘large’ droplet category would be in exceedance of FAR/JAR 25, Appendix C. Data sets of 117 winter-season and 55 summer-season icing events have been statistically analysed. As defined above, the data sets include 11 winter and five summer large droplet icing encounters. Icing events included ‘sandpaper’ icing from short-duration ‘large’ droplets, and a singular ridge formation icing event in ‘large’ droplet. The frequency of ‘large’ droplet icing events amounted to 1 in 20 flight hours in winter and 1 in 35 flight hours in summer. These figures reflect ‘large’ droplet icing encounter probabilities perhaps substantially greater than previously considered. The ‘large’ droplet events were quite localised, mean scale-size being about 6nm.

Download Full-text

The New Statistics with R

10.1093/oso/9780198798170.001.0001 ◽

2021 ◽

Cited By ~ 1

Author(s):

Andy Hector

Keyword(s):

Linear Model ◽

Evolutionary Biology ◽

Environmental Science ◽

Research Training ◽

Model Analysis ◽

Scientific Data ◽

Information Criteria ◽

Reproducible Research ◽

Data Sets ◽

R Programming

Statistics is a fundamental component of the scientific toolbox, but learning the basics of this area of mathematics is one of the most challenging parts of a research training. This book gives an up-to-date introduction to the classical techniques and modern extensions of linear-model analysis—one of the most useful approaches in the analysis of scientific data in the life and environmental sciences. The book emphasizes an estimation-based approach that takes account of recent criticisms of overuse of probability values and introduces the alternative approach using information criteria. The book is based on the use of the open-source R programming language for statistics and graphics, which is rapidly becoming the lingua franca in many areas of science. This second edition adds new chapters, including one discussing some of the complexities of linear-model analysis and another introducing reproducible research documents using the R Markdown package. Statistics is introduced through worked analyses performed in R using interesting data sets from ecology, evolutionary biology, and environmental science. The data sets and R scripts are available as supporting material.

Download Full-text