Final report for LDRD project 11-0029 : high-interest event detection in large-scale multi-modal data sets : proof of concept.

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

Accelerating In-Transit Co-Processing for Scientific Simulations Using Region-Based Data-Driven Analysis

Algorithms ◽

10.3390/a14050154 ◽

2021 ◽

Vol 14 (5) ◽

pp. 154

Author(s):

Marcus Walldén ◽

Masao Okita ◽

Fumihiko Ino ◽

Dimitris Drikakis ◽

Ioannis Kokkinakis

Keyword(s):

Large Scale ◽

Data Driven ◽

Data Sets ◽

Output Constraints ◽

Data Driven Approach ◽

Scientific Simulations ◽

Multiple Metrics ◽

In Transit ◽

Multiple Compression ◽

Large Scale Simulations

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.

Download Full-text

The International Radiomics Platform – An Initiative of the German and Austrian Radiological Societies

RöFo - Fortschritte auf dem Gebiet der Röntgenstrahlen und der bildgebenden Verfahren ◽

10.1055/a-1244-2775 ◽

2020 ◽

Author(s):

Daniel Overhoff ◽

Peter Kohlmann ◽

Alex Frydrychowicz ◽

Sergios Gatidis ◽

Christian Loewe ◽

...

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Image Data ◽

Quality Parameters ◽

Image Features ◽

Performance Criteria ◽

Data Sets ◽

Proof Of Concept ◽

Concept Study ◽

Artificial Intelligence Methods

Purpose The DRG-ÖRG IRP (Deutsche Röntgengesellschaft-Österreichische Röntgengesellschaft international radiomics platform) represents a web-/cloud-based radiomics platform based on a public-private partnership. It offers the possibility of data sharing, annotation, validation and certification in the field of artificial intelligence, radiomics analysis, and integrated diagnostics. In a first proof-of-concept study, automated myocardial segmentation and automated myocardial late gadolinum enhancement (LGE) detection using radiomic image features will be evaluated for myocarditis data sets. Materials and Methods The DRG-ÖRP IRP can be used to create quality-assured, structured image data in combination with clinical data and subsequent integrated data analysis and is characterized by the following performance criteria: Possibility of using multicentric networked data, automatically calculated quality parameters, processing of annotation tasks, contour recognition using conventional and artificial intelligence methods and the possibility of targeted integration of algorithms. In a first study, a neural network pre-trained using cardiac CINE data sets was evaluated for segmentation of PSIR data sets. In a second step, radiomic features were applied for segmental detection of LGE of the same data sets, which were provided multicenter via the IRP. Results First results show the advantages (data transparency, reliability, broad involvement of all members, continuous evolution as well as validation and certification) of this platform-based approach. In the proof-of-concept study, the neural network demonstrated a Dice coefficient of 0.813 compared to the expert's segmentation of the myocardium. In the segment-based myocardial LGE detection, the AUC was 0.73 and 0.79 after exclusion of segments with uncertain annotation.The evaluation and provision of the data takes place at the IRP, taking into account the FAT (fairness, accountability, transparency) and FAIR (findable, accessible, interoperable, reusable) criteria. Conclusion It could be shown that the DRG-ÖRP IRP can be used as a crystallization point for the generation of further individual and joint projects. The execution of quantitative analyses with artificial intelligence methods is greatly facilitated by the platform approach of the DRG-ÖRP IRP, since pre-trained neural networks can be integrated and scientific groups can be networked.In a first proof-of-concept study on automated segmentation of the myocardium and automated myocardial LGE detection, these advantages were successfully applied.Our study shows that with the DRG-ÖRP IRP, strategic goals can be implemented in an interdisciplinary way, that concrete proof-of-concept examples can be demonstrated, and that a large number of individual and joint projects can be realized in a participatory way involving all groups. Key Points: Citation Format

Download Full-text

Multi-task learning based Encoder-Decoder: A comprehensive detection and diagnosis system for multi-sensor data

Advances in Mechanical Engineering ◽

10.1177/16878140211013138 ◽

2021 ◽

Vol 13 (5) ◽

pp. 168781402110131

Author(s):

Junfeng Wu ◽

Li Yao ◽

Bin Liu ◽

Zheyuan Ding ◽

Lei Zhang

Keyword(s):

Anomaly Detection ◽

Event Detection ◽

Large Scale ◽

Multivariate Time Series ◽

Sensor Data ◽

Unified Framework ◽

Diagnosis System ◽

Learning Framework ◽

Task Learning ◽

Detection And Diagnosis

As more and more sensor data have been collected, automated detection, and diagnosis systems are urgently needed to lessen the increasing monitoring burden and reduce the risk of system faults. A plethora of researches have been done on anomaly detection, event detection, anomaly diagnosis respectively. However, none of current approaches can explore all these respects in one unified framework. In this work, a Multi-Task Learning based Encoder-Decoder (MTLED) which can simultaneously detect anomalies, diagnose anomalies, and detect events is proposed. In MTLED, feature matrix is introduced so that features are extracted for each time point and point-wise anomaly detection can be realized in an end-to-end way. Anomaly diagnosis and event detection share the same feature matrix with anomaly detection in the multi-task learning framework and also provide important information for system monitoring. To train such a comprehensive detection and diagnosis system, a large-scale multivariate time series dataset which contains anomalies of multiple types is generated with simulation tools. Extensive experiments on the synthetic dataset verify the effectiveness of MTLED and its multi-task learning framework, and the evaluation on a real-world dataset demonstrates that MTLED can be used in other application scenarios through transfer learning.

Download Full-text

Compartment and hub definitions tune metabolic networks for metabolomic interpretations

GigaScience ◽

10.1093/gigascience/giz137 ◽

2020 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

T Cameron Waller ◽

Jordan A Berg ◽

Alexander Lex ◽

Brian E Chapman ◽

Jared Rutter

Keyword(s):

Large Scale ◽

Metabolic Networks ◽

Shortest Paths ◽

Large Data ◽

Differential Regulation ◽

Large Data Sets ◽

Data Sets ◽

Human Metabolism ◽

Experimental Conditions ◽

Systemic Model

Abstract Background Metabolic networks represent all chemical reactions that occur between molecular metabolites in an organism’s cells. They offer biological context in which to integrate, analyze, and interpret omic measurements, but their large scale and extensive connectivity present unique challenges. While it is practical to simplify these networks by placing constraints on compartments and hubs, it is unclear how these simplifications alter the structure of metabolic networks and the interpretation of metabolomic experiments. Results We curated and adapted the latest systemic model of human metabolism and developed customizable tools to define metabolic networks with and without compartmentalization in subcellular organelles and with or without inclusion of prolific metabolite hubs. Compartmentalization made networks larger, less dense, and more modular, whereas hubs made networks larger, more dense, and less modular. When present, these hubs also dominated shortest paths in the network, yet their exclusion exposed the subtler prominence of other metabolites that are typically more relevant to metabolomic experiments. We applied the non-compartmental network without metabolite hubs in a retrospective, exploratory analysis of metabolomic measurements from 5 studies on human tissues. Network clusters identified individual reactions that might experience differential regulation between experimental conditions, several of which were not apparent in the original publications. Conclusions Exclusion of specific metabolite hubs exposes modularity in both compartmental and non-compartmental metabolic networks, improving detection of relevant clusters in omic measurements. Better computational detection of metabolic network clusters in large data sets has potential to identify differential regulation of individual genes, transcripts, and proteins.

Download Full-text

Spot event detection along a large-scale sensor based on ultra-weak fiber Bragg gratings using time–frequency analysis

Applied Optics ◽

10.1364/ao.55.001054 ◽

2016 ◽

Vol 55 (5) ◽

pp. 1054 ◽

Cited By ~ 2

Author(s):

Amelia Lavinia Ricchiuti ◽

Salvador Sales

Keyword(s):

Frequency Analysis ◽

Event Detection ◽

Large Scale ◽

Fiber Bragg Gratings ◽

Bragg Gratings ◽

Time Frequency Analysis ◽

Time Frequency

Download Full-text

High Performance Computational Analysis of Large-scale Proteome Data Sets to Assess Incremental Contribution to Coverage of the Human Genome

Journal of Proteome Research ◽

10.1021/pr400181q ◽

2013 ◽

Vol 12 (6) ◽

pp. 2858-2868 ◽

Cited By ~ 29

Author(s):

Nadin Neuhauser ◽

Nagarjuna Nagaraj ◽

Peter McHardy ◽

Sara Zanivan ◽

Richard Scheltema ◽

...

Keyword(s):

Human Genome ◽

High Performance ◽

Large Scale ◽

Computational Analysis ◽

Data Sets

Download Full-text

ON GENERATING DIGITAL ELEVATION MODELS FROM LIDAR DATA – RESOLUTION VERSUS ACCURACY AND TOPOGRAPHIC WETNESS INDEX INDICES IN NORTHERN PEATLANDS

Geodesy and Cartography ◽

10.3846/20296991.2012.702983 ◽

2012 ◽

Vol 38 (2) ◽

pp. 57-69 ◽

Cited By ~ 12

Author(s):

Abdulghani Hasan ◽

Petter Pilesjö ◽

Andreas Persson

Keyword(s):

Large Scale ◽

Drainage Area ◽

Data Sets ◽

Topographic Wetness Index ◽

Absolute Deviation ◽

Digital Elevation ◽

Elevation Data ◽

Scale Modelling ◽

Data Points ◽

Emission Modelling

Global change and GHG emission modelling are dependent on accurate wetness estimations for predictions of e.g. methane emissions. This study aims to quantify how the slope, drainage area and the TWI vary with the resolution of DEMs for a flat peatland area. Six DEMs with spatial resolutions from 0.5 to 90 m were interpolated with four different search radiuses. The relationship between accuracy of the DEM and the slope was tested. The LiDAR elevation data was divided into two data sets. The number of data points facilitated an evaluation dataset with data points not more than 10 mm away from the cell centre points in the interpolation dataset. The DEM was evaluated using a quantile-quantile test and the normalized median absolute deviation. It showed independence of the resolution when using the same search radius. The accuracy of the estimated elevation for different slopes was tested using the 0.5 meter DEM and it showed a higher deviation from evaluation data for steep areas. The slope estimations between resolutions showed differences with values that exceeded 50%. Drainage areas were tested for three resolutions, with coinciding evaluation points. The model ability to generate drainage area at each resolution was tested by pair wise comparison of three data subsets and showed differences of more than 50% in 25% of the evaluated points. The results show that consideration of DEM resolution is a necessity for the use of slope, drainage area and TWI data in large scale modelling.

Download Full-text