DeepDeath: Learning to Predict the Underlying Cause of Death with Big Data

Mapping Intimacies ◽

10.1101/134965 ◽

2017 ◽

Author(s):

Hamid Reza Hassanzadeh ◽

Ying Sha ◽

May D. Wang

Keyword(s):

Big Data ◽

Cause Of Death ◽

Large Scale ◽

Ad Hoc ◽

Short Term Memory ◽

Mortality Data ◽

Deep Model ◽

Large Populations ◽

Health Related ◽

N Gram

AbstractMultiple cause-of-death data provides a valuable source of information that can be used to enhance health standards by predicting health related trajectories in societies with large populations. These data are often available in large quantities across U.S. states and require Big Data techniques to uncover complex hidden patterns. We design two different classes of models suitable for large-scale analysis of mortality data, a Hadoop-based ensemble of random forests trained over N-grams, and the DeepDeath, a deep classifier based on the recurrent neural network (RNN). We apply both classes to the mortality data provided by the National Center for Health Statistics and show that while both perform significantly better than the random classifier, the deep model that utilizes long short-term memory networks (LSTMs), surpasses the N-gram based models and is capable of learning the temporal aspect of the data without a need for building ad-hoc, expert-driven features.

Download Full-text

Big Data Analytics & Artificial Intelligence in Healthcare

10.21203/rs.3.rs-150303/v1 ◽

2021 ◽

Author(s):

PRANJAL KUMAR ◽

Siddhartha Chauhan

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

Large Scale ◽

Health Sector ◽

Big Data Analytics ◽

Personal Data ◽

Related Data ◽

Large Scale Data ◽

Ethical Policy ◽

Health Related

Abstract Big data analysis and Artificial Intelligence have received significant attention recently in creating more opportunities in the health sector for aggregating or collecting large-scale data. Today, our genomes and microbiomes can be sequenced i.e., all information exchanged between physicians and patients in Electronic Health Records (EHR) can be collected and traced at least theoretically. Social media and mobile devices today obviously provide many health-related data regarding activity, diets, social contacts, and so on. However, it is increasingly difficult to use this information to answer health questions and, in particular, because the data comes from various domains and lives in different infrastructures and of course it also is very variable quality. The massive collection and aggregation of personal data come with a number of ethical policy, methodological, technological challenges. It should be acknowledged that large-scale clinical evidence remains to confirm the promise of Big Data and Artificial Intelligence (AI) in health care. This paper explores the complexities of big data & artificial intelligence in healthcare as well as the benefits and prospects.

Download Full-text

Market2Dish: Health-aware Food Recommendation

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3418211 ◽

2021 ◽

Vol 17 (1) ◽

pp. 1-19

Author(s):

Wenjie Wang ◽

Ling-Yu Duan ◽

Hao Jiang ◽

Peiguang Jing ◽

Xuemeng Song ◽

...

Keyword(s):

Large Scale ◽

Interaction Mechanism ◽

User Preference ◽

Fine Grained ◽

Related Information ◽

Deep Model ◽

Obesity And Diabetes ◽

Health Related ◽

Rising Incidence ◽

Personalized Health

With the rising incidence of some diseases, such as obesity and diabetes, the healthy diet is arousing increasing attention. However, most existing food-related research efforts focus on recipe retrieval, user-preference-based food recommendation, cooking assistance, or the nutrition and calorie estimation of dishes, ignoring the personalized health-aware food recommendation. Therefore, in this work, we present a personalized health-aware food recommendation scheme, namely, Market2Dish, mapping the ingredients displayed in the market to the healthy dishes eaten at home. The proposed scheme comprises three components, namely, recipe retrieval, user health profiling, and health-aware food recommendation. In particular, recipe retrieval aims to acquire the ingredients available to the users and then retrieve recipe candidates from a large-scale recipe dataset. User health profiling is to characterize the health conditions of users by capturing the textual health-related information crawled from social networks. Specifically, to solve the issue that the health-related information is extremely sparse, we incorporate a word-class interaction mechanism into the proposed deep model to learn the fine-grained correlations between the textual tweets and pre-defined health concepts. For the health-aware food recommendation, we present a novel category-aware hierarchical memory network–based recommender to learn the health-aware user-recipe interactions for better food recommendation. Moreover, extensive experiments demonstrate the effectiveness of the health-aware food recommendation scheme.

Download Full-text

The Use of Medical Record Linkage for Population and Genetic Studies

Methods of Information in Medicine ◽

10.1055/s-0038-1635962 ◽

1969 ◽

Vol 08 (01) ◽

pp. 07-11 ◽

Cited By ~ 9

Author(s):

H. B. Newcombe

Keyword(s):

Record Linkage ◽

Large Scale ◽

Medical Record Linkage ◽

Canadian Province ◽

Genetic Studies ◽

Parental Characteristics ◽

Family Histories ◽

The Family ◽

Large Populations ◽

Machine Readable

Methods are described for deriving personal and family histories of birth, marriage, procreation, ill health and death, for large populations, from existing civil registrations of vital events and the routine records of ill health. Computers have been used to group together and »link« the separately derived records pertaining to successive events in the lives of the same individuals and families, rapidly and on a large scale. Most of the records employed are already available as machine readable punchcards and magnetic tapes, for statistical and administrative purposes, and only minor modifications have been made to the manner in which these are produced.As applied to the population of the Canadian province of British Columbia (currently about 2 million people) these methods have already yielded substantial information on the risks of disease: a) in the population, b) in relation to various parental characteristics, and c) as correlated with previous occurrences in the family histories.

Download Full-text

A NOVEL ANALYTIC APPROACH FOR LARGE SCALE POWER PLANT WIDE PROCESSES WITH BIG DATA

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.6.30 ◽

2020 ◽

Vol 9 (6) ◽

pp. 3509-3517

Author(s):

K. Malakonda Rayudu ◽

A. Kumar

Keyword(s):

Big Data ◽

Power Plant ◽

Large Scale ◽

Analytic Approach

Download Full-text

Multi Disease-Prediction Framework Using Hybrid Deep Learning: An Optimal Prediction Model (Preprint)

10.2196/preprints.22865 ◽

2020 ◽

Author(s):

Anusha Ampavathi ◽

Vijaya Saradhi T

Keyword(s):

Feature Extraction ◽

Big Data ◽

Deep Learning ◽

Weight Function ◽

Optimization Algorithm ◽

Large Scale ◽

Heuristic Algorithms ◽

Disease Prediction ◽

Health Care Decisions ◽

Proposed Model

UNSTRUCTURED Big data and its approaches are generally helpful for healthcare and biomedical sectors for predicting the disease. For trivial symptoms, the difficulty is to meet the doctors at any time in the hospital. Thus, big data provides essential data regarding the diseases on the basis of the patient’s symptoms. For several medical organizations, disease prediction is important for making the best feasible health care decisions. Conversely, the conventional medical care model offers input as structured that requires more accurate and consistent prediction. This paper is planned to develop the multi-disease prediction using the improvised deep learning concept. Here, the different datasets pertain to “Diabetes, Hepatitis, lung cancer, liver tumor, heart disease, Parkinson’s disease, and Alzheimer’s disease”, from the benchmark UCI repository is gathered for conducting the experiment. The proposed model involves three phases (a) Data normalization (b) Weighted normalized feature extraction, and (c) prediction. Initially, the dataset is normalized in order to make the attribute's range at a certain level. Further, weighted feature extraction is performed, in which a weight function is multiplied with each attribute value for making large scale deviation. Here, the weight function is optimized using the combination of two meta-heuristic algorithms termed as Jaya Algorithm-based Multi-Verse Optimization algorithm (JA-MVO). The optimally extracted features are subjected to the hybrid deep learning algorithms like “Deep Belief Network (DBN) and Recurrent Neural Network (RNN)”. As a modification to hybrid deep learning architecture, the weight of both DBN and RNN is optimized using the same hybrid optimization algorithm. Further, the comparative evaluation of the proposed prediction over the existing models certifies its effectiveness through various performance measures.

Download Full-text

Analysis of the Influence of Big Data Background on the Spread of Large-Scale Sports Events

Journal of Physics Conference Series ◽

10.1088/1742-6596/1744/3/032003 ◽

2021 ◽

Vol 1744 (3) ◽

pp. 032003

Author(s):

Tieniu Xia

Keyword(s):

Big Data ◽

Large Scale ◽

Sports Events

Download Full-text

NPU RGB+D Dataset and a Feature-Enhanced LSTM-DGCN Method for Action Recognition of Basketball Players

Applied Sciences ◽

10.3390/app11104426 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4426

Author(s):

Chunyan Ma ◽

Ji Fan ◽

Jinghao Yao ◽

Tao Zhang

Keyword(s):

Action Recognition ◽

Large Scale ◽

Short Term Memory ◽

Evaluation Criteria ◽

Image Data ◽

Basketball Player ◽

Basketball Players ◽

Convolutional Network ◽

Atomic Actions ◽

New Feature

Computer vision-based action recognition of basketball players in basketball training and competition has gradually become a research hotspot. However, owing to the complex technical action, diverse background, and limb occlusion, it remains a challenging task without effective solutions or public dataset benchmarks. In this study, we defined 32 kinds of atomic actions covering most of the complex actions for basketball players and built the dataset NPU RGB+D (a large scale dataset of basketball action recognition with RGB image data and Depth data captured in Northwestern Polytechnical University) for 12 kinds of actions of 10 professional basketball players with 2169 RGB+D videos and 75 thousand frames, including RGB frame sequences, depth maps, and skeleton coordinates. Through extracting the spatial features of the distances and angles between the joint points of basketball players, we created a new feature-enhanced skeleton-based method called LSTM-DGCN for basketball player action recognition based on the deep graph convolutional network (DGCN) and long short-term memory (LSTM) methods. Many advanced action recognition methods were evaluated on our dataset and compared with our proposed method. The experimental results show that the NPU RGB+D dataset is very competitive with the current action recognition algorithms and that our LSTM-DGCN outperforms the state-of-the-art action recognition methods in various evaluation criteria on our dataset. Our action classifications and this NPU RGB+D dataset are valuable for basketball player action recognition techniques. The feature-enhanced LSTM-DGCN has a more accurate action recognition effect, which improves the motion expression ability of the skeleton data.

Download Full-text

Examination of Construct Validity and Criterion-Related Validity of the German Motor Test in Egyptian Schoolchildren

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18168341 ◽

2021 ◽

Vol 18 (16) ◽

pp. 8341

Author(s):

Osama Abdelkarim ◽

Julian Fritsch ◽

Darko Jekauc ◽

Klaus Bös

Keyword(s):

Construct Validity ◽

Physical Fitness ◽

Large Scale ◽

Body Height ◽

Cross Sectional Study ◽

Latent Factors ◽

Cross Sectional ◽

First Order ◽

Motor Test ◽

Health Related

Physical fitness is an indicator for children’s public health status. Therefore, the aim of this study was to examine the construct validity and the criterion-related validity of the German motor test (GMT) in Egyptian schoolchildren. A cross-sectional study was conducted with a total of 931 children aged 6 to 11 years (age: 9.1 ± 1.7 years) with 484 (52%) males and 447 (48%) females in grades one to five in Assiut city. The children’s physical fitness data were collected using GMT. GMT is designed to measure five health-related physical fitness components including speed, strength, coordination, endurance, and flexibility of children aged 6 to 18 years. The anthropometric data were collected based on three indicators: body height, body weight, and BMI. A confirmatory factor analysis was conducted with IBM SPSS AMOS 26.0 using full-information maximum likelihood. The results indicated an adequate fit (χ2 = 112.3, df = 20; p < 0.01; CFI = 0.956; RMSEA = 0.07). The χ2-statistic showed significant results, and the values for CFI and RMSEA showed a good fit. All loadings of the manifest variables on the first-order latent factors as well as loadings of the first-order latent factors on the second-order superordinate factor were significant. The results also showed strong construct validity in the components of conditioning abilities and moderate construct validity in the components of coordinative abilities. GMT proved to be a valid method and could be widely used on large-scale studies for health-related fitness monitoring in the Egyptian population.

Download Full-text

Neural methods for effective, efficient, and exposure-aware information retrieval

ACM SIGIR Forum ◽

10.1145/3476415.3476434 ◽

2021 ◽

Vol 55 (1) ◽

pp. 1-2

Author(s):

Bhaskar Mitra

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Large Scale ◽

Web Search ◽

Real Life ◽

Inverted Index ◽

Information Need ◽

Product Model ◽

Performance Improvements ◽

Deep Model

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.

Download Full-text

Measuring sleep regularity: Theoretical properties and practical usage of existing metrics

SLEEP ◽

10.1093/sleep/zsab103 ◽

2021 ◽

Author(s):

Dorothee Fischer ◽

Elizabeth B Klerman ◽

Andrew J K Phillips

Keyword(s):

Large Scale ◽

Research Question ◽

Group Differences ◽

Regularity Index ◽

Wake Patterns ◽

Health Related ◽

Phase Deviation ◽

Unbiased Estimates ◽

Large Scale Simulations ◽

Larger Sample

Abstract Study Objectives Sleep regularity predicts many health-related outcomes. Currently, however, there is no systematic approach to measuring sleep regularity. Traditionally, metrics have assessed deviations in sleep patterns from an individual’s average. Traditional metrics include intra-individual standard deviation (StDev), Interdaily Stability (IS), and Social Jet Lag (SJL). Two metrics were recently proposed that instead measure variability between consecutive days: Composite Phase Deviation (CPD) and Sleep Regularity Index (SRI). Using large-scale simulations, we investigated the theoretical properties of these five metrics. Methods Multiple sleep-wake patterns were systematically simulated, including variability in daily sleep timing and/or duration. Average estimates and 95% confidence intervals were calculated for six scenarios that affect measurement of sleep regularity: ‘scrambling’ the order of days; daily vs. weekly variation; naps; awakenings; ‘all-nighters’; and length of study. Results SJL measured weekly but not daily changes. Scrambling did not affect StDev or IS, but did affect CPD and SRI; these metrics, therefore, measure sleep regularity on multi-day and day-to-day timescales, respectively. StDev and CPD did not capture sleep fragmentation. IS and SRI behaved similarly in response to naps and awakenings but differed markedly for all-nighters. StDev and IS required over a week of sleep-wake data for unbiased estimates, whereas CPD and SRI required larger sample sizes to detect group differences. Conclusions Deciding which sleep regularity metric is most appropriate for a given study depends on a combination of the type of data gathered, the study length and sample size, and which aspects of sleep regularity are most pertinent to the research question.

Download Full-text