A Knowledge Graph-Based Data Integration Framework Applied to Battery Data Management

Tahir Emre Kalaycı; Bor Bricelj; Marko Lah; Franz Pichler; Matthias K. Scharrer; Jelena Rubeša-Zrim

doi:10.3390/su13031583

A Knowledge Graph-Based Data Integration Framework Applied to Battery Data Management

Sustainability ◽

10.3390/su13031583 ◽

2021 ◽

Vol 13 (3) ◽

pp. 1583

Author(s):

Tahir Emre Kalaycı ◽

Bor Bricelj ◽

Marko Lah ◽

Franz Pichler ◽

Matthias K. Scharrer ◽

...

Keyword(s):

Machine Learning ◽

Data Integration ◽

Electric Vehicles ◽

Data Science ◽

Transportation Systems ◽

Transformation Process ◽

Knowledge Graph ◽

Data Sets ◽

Use Case ◽

Integration Framework

Today, the automotive and transportation sector is undergoing a transformation process to meet the requirements of sustainable and efficient operations. This transformation mainly reveals itself by electric vehicles, hybrid electric vehicles, and electric vehicle sharing. One significant, and the most expensive, component in electric vehicles is the batteries, and the management of batteries is crucial. It is essential to perform constant monitoring of behavior changes for operational purposes and quickly adjust components and operations to these changes. Thus, to address these challenges, we propose a knowledge graph-based data integration framework for simplifying access and analysis of data accumulated through the operations of vehicles and related transportation systems. The proposed framework aims to enable the effortless analysis and navigation of integrated knowledge and the creation of additional data sets from this knowledge to use during the application of data analysis and machine learning. The knowledge graph serves as a significant component to simplify the extraction, enrichment, exploration, and generation of data in this framework. We have developed it according to the human-centered design, and various roles of the data science and machine learning life cycle can use it. Its main objective is to streamline the exploration and interaction with the integrated data to maximize human productivity. Finally, we present a battery use case to show the feasibility and benefits of the proposed framework. The use case illustrates the usage of the framework to extract knowledge from raw data, navigate and enrich it with additional knowledge, and generate data sets.

Download Full-text

MapSDI: A Scaled-Up Semantic Data Integration Framework for Knowledge Graph Creation

Lecture Notes in Computer Science - On the Move to Meaningful Internet Systems: OTM 2019 Conferences ◽

10.1007/978-3-030-33246-4_4 ◽

2019 ◽

pp. 58-75

Author(s):

Samaneh Jozashoori ◽

Maria-Esther Vidal

Keyword(s):

Data Integration ◽

Knowledge Graph ◽

Integration Framework ◽

Semantic Data ◽

Semantic Data Integration

Download Full-text

Unitarism vs. Individuality and a New Digital Agenda: The Power of Decentralized Web

Frontiers in Human Dynamics ◽

10.3389/fhumd.2021.626299 ◽

2021 ◽

Vol 3 ◽

Author(s):

Anastassia Lauterbach

Keyword(s):

Machine Learning ◽

Data Exchange ◽

Large Scale ◽

Data Science ◽

Personal Information ◽

Compulsory Education ◽

Small Data ◽

Data Sets ◽

Cultural Aspects ◽

Learning Capabilities

Discussions around Covid-19 apps and models demonstrated that primary challenges for AI and data science focused on governance and ethics. Personal information was involved in building data sets. It was unclear how this information could be utilized in large scale models to provide predictions and insights while observing privacy requirements. Most people expected a lot from technology but were unwilling to sacrifice part of their privacy for building it. Conversely, regulators and policy makers require AI and data science practitioners to ensure optimal public health, national security while avoiding these privacy-related struggles. Their choices vary largely from country to country and are driven more by cultural aspects, and less by machine learning capabilities. The question is whether current ways to design technology and work with data sets are sustainable and lead to a good outcome for individuals and their communities. At the same time Covid-19 made it obvious that economies and societies cannot succeed without far-reaching digital policies, touching every aspect of how we provide and receive education, live, and work. Most regions, businesses and individuals struggled to benefit from competitive capabilities modern data technologies could bring. This opinion paper suggests how Germany and Europe can rethink their digital policy while recognizing the value of data, introducing Data IDs for consumers and businesses, committing to support innovation in decentralized data technologies, introducing concepts of Data Trusts and compulsory education around data starting from the early school age. Besides, it discusses advantages of data-tokens to shape a new ecosystem for decentralized data exchange. Furthermore, it emphasizes the necessity to develop and promote technologies to work with small data sets and handle data in compliance with privacy regulations, keeping in mind costs for the environment while bidding on big data and large-scale machine learning models. Finally, innovation as an integral part of any data scientist's job will be called for.

Download Full-text

Reshaping Human Factors Education in Times of Big Data: Practitioner Perspectives

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181321651019 ◽

2021 ◽

Vol 65 (1) ◽

pp. 1574-1578

Author(s):

Bella Yigong Zhang ◽

Mark Chignell

Keyword(s):

Machine Learning ◽

Big Data ◽

Human Factors ◽

Task Analysis ◽

Data Science ◽

Data Sets ◽

Human Factors Engineering ◽

Design Data ◽

Wide Range ◽

Human Use

Human Factors Engineering (HFE) is an applied discipline that uses a wide range of methodologies to better the design of systems and devices for human use. Underpinning all human factors design is the maxim to fit the human to the task/machine/system rather than vice versa. While some HFE methods such as task analysis and anthropometrics remain relatively fixed over time, areas such as human-technology interaction are strongly influenced by the fast-evolving technological trend. In times of big data, human factors engineers need to have a good understanding of topics like machine learning, advanced data analytics, and data visualization so that they can design data-driven products that involve big data sets. There is a natural lag between industrial trends and HFE curricula, leading to gaps between what people are taught and what they will need to know. In this paper, we present the results of a survey involving HFE practitioners (N=101) and we demonstrate the need for including data science and machine learning components in HFE curricula.

Download Full-text

Data mining and machine learning to enhance new-particle formation identification and analysis

10.5194/egusphere-egu2020-2355 ◽

2020 ◽

Author(s):

Martha A. Zaidan ◽

Pak L. Fung ◽

Darren Wraith ◽

Tuomo Nieminen ◽

Tareq Hussein ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Atmospheric Chemistry ◽

Data Science ◽

Large Data ◽

Particle Formation ◽

Linear Process ◽

Data Sets ◽

New Particle Formation ◽

Area Of Interest

Data Mining (DM) and Machine Learning (ML) have become very popular modern statistical learning tools in solving many complex scientific problems. In this work, we present two case studies that used DM and ML techniques to enhance new-particle formation (NPF) identification and analysis. Extensive measurements and large data sets related to NPF and other ambient variables have been collected in arctic and boreal regions. The focus area of our studies is the SMEAR II station located in Hyyti&#228;l&#228; forest, Finland that is in the area of interest of the Pan-Eurasian Experiment (PEEX).Atmospheric NPF is an important source of climatically relevant atmospheric aerosol particles. NPF is typically observed by monitoring the time-evolution of ambient aerosol particle size distributions. Due to the noisiness of the real-world ambient data, currently the most reliable way to classify measurement days into NPF event/non-event days is through a manual visualisation method. However, manual labour, with long multi-year time series, is extremely time-consuming and human subjectivity poses challenges for comparing the results of different data sets. In this case, ML classifier is used to classify event/non-event days of NPF using a manually generated database. The results demonstrate that ML-based approaches point towards the potential of these methods and suggest further exploration in this direction.Furthermore, NPF is a very non-linear process that includes atmospheric chemistry of precursors and clustering physics as well as subsequent growth before NPF can be observed. Thanks to ongoing efforts, now there exists a tremendous amount of atmospheric data, obtained through continuous measurements directly from the atmosphere. This fact makes the analysis by human brains difficult, on the other hand, enables the usage of modern data science techniques. Here, we demonstrate the use of DM method, named mutual information (MI) to understand NPF events and a wide variety of simultaneously monitored ambient variables. The same results are obtained by the proposed MI method which operates without supervision and without the need of understanding the physics deeply.

Download Full-text

Framework for WASH Sector Data Improvements in Data-Poor Environments, Applied to Accra, Ghana

Water ◽

10.3390/w10091278 ◽

2018 ◽

Vol 10 (9) ◽

pp. 1278

Author(s):

Rembrandt H. E. M. Koppelaar ◽

May N. Sule ◽

Zoltán Kis ◽

Foster K. Mensah ◽

Xiaonan Wang ◽

...

Keyword(s):

Data Integration ◽

Day Treatment ◽

Open Data ◽

Data Availability ◽

Treated Wastewater ◽

Data Sets ◽

Infrastructure Development ◽

Integration Framework ◽

Modelling Methodology ◽

Level Of Understanding

Improvements in water, sanitation and hygiene (WASH) service provision are hampered by limited open data availability. This paper presents a data integration framework, collects the data and develops a material flow model, which aids data-based policy and infrastructure development for the WASH sector. This model provides a robust quantitative mapping of the complete anthropogenic WASH flow-cycle: from raw water intake to water use, wastewater and excreta generation, discharge and treatment. This approach integrates various available sources using a process-chain bottom-up engineering approach to improve the quality of WASH planning. The data integration framework and the modelling methodology are applied to the Greater Accra Metropolitan Area (GAMA), Ghana. The highest level of understanding of the GAMA WASH sector is achieved, promoting scenario testing for future WASH developments. The results show 96% of the population had access to improved safe water in 2010 if sachet and bottled water was included, but only 67% if excluded. Additionally, 66% of 338,000 m3 per day of generated wastewater is unsafely disposed locally, with 23% entering open drains, and 11% sewage pipes, indicating poor sanitation coverage. Total treated wastewater is <0.5% in 2014, with only 18% of 43,000 m3 per day treatment capacity operational. The combined data sets are made available to support research and sustainable development activities.

Download Full-text

Data Science Methods for Psychology

Psychology ◽

10.1093/obo/9780199828340-0259 ◽

2020 ◽

Author(s):

Jeffrey Stanton

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analysis ◽

Data Collection ◽

Data Science ◽

Large Data ◽

Large Data Sets ◽

Predictive Analysis ◽

Data Sets ◽

The Impact

The term “data science” refers to an emerging field of research and practice that focuses on obtaining, processing, visualizing, analyzing, preserving, and re-using large collections of information. A related term, “big data,” has been used to refer to one of the important challenges faced by data scientists in many applied environments: the need to analyze large data sources, in certain cases using high-speed, real-time data analysis techniques. Data science encompasses much more than big data, however, as a result of many advancements in cognate fields such as computer science and statistics. Data science has also benefited from the widespread availability of inexpensive computing hardware—a development that has enabled “cloud-based” services for the storage and analysis of large data sets. The techniques and tools of data science have broad applicability in the sciences. Within the field of psychology, data science offers new opportunities for data collection and data analysis that have begun to streamline and augment efforts to investigate the brain and behavior. The tools of data science also enable new areas of research, such as computational neuroscience. As an example of the impact of data science, psychologists frequently use predictive analysis as an investigative tool to probe the relationships between a set of independent variables and one or more dependent variables. While predictive analysis has traditionally been accomplished with techniques such as multiple regression, recent developments in the area of machine learning have put new predictive tools in the hands of psychologists. These machine learning tools relax distributional assumptions and facilitate exploration of non-linear relationships among variables. These tools also enable the analysis of large data sets by opening options for parallel processing. In this article, a range of relevant areas from data science is reviewed for applicability to key research problems in psychology including large-scale data collection, exploratory data analysis, confirmatory data analysis, and visualization. This bibliography covers data mining, machine learning, deep learning, natural language processing, Bayesian data analysis, visualization, crowdsourcing, web scraping, open source software, application programming interfaces, and research resources such as journals and textbooks.

Download Full-text

Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets

Politics and Governance ◽

10.17645/pag.v8i2.2591 ◽

2020 ◽

Vol 8 (2) ◽

pp. 326-339 ◽

Cited By ~ 2

Author(s):

Sebastian Haunss ◽

Jonas Kuhn ◽

Sebastian Padó ◽

Andre Blessing ◽

Nico Blokker ◽

...

Keyword(s):

Machine Learning ◽

Data Sets ◽

Use Case ◽

Automatic Annotation ◽

Core Network ◽

Research Focus ◽

The Core ◽

Text Corpora ◽

Annotation Quality

This article investigates the integration of machine learning in the political claim annotation workflow with the goal to partially automate the annotation and analysis of large text corpora. It introduces the MARDY annotation environment and presents results from an experiment in which the annotation quality of annotators with and without machine learning based annotation support is compared. The design and setting aim to measure and evaluate: a) annotation speed; b) annotation quality; and c) applicability to the use case of discourse network generation. While the results indicate only slight increases in terms of annotation speed, the authors find a moderate boost in annotation quality. Additionally, with the help of manual annotation of the actors and filtering out of the false positives, the machine learning based annotation suggestions allow the authors to fully recover the core network of the discourse as extracted from the articles annotated during the experiment. This is due to the redundancy which is naturally present in the annotated texts. Thus, assuming a research focus not on the complete network but the network core, an AI-based annotation can provide reliable information about discourse networks with much less human intervention than compared to the traditional manual approach.

Download Full-text

Data Science on Industrial Data—Today’s Challenges in Brown Field Applications

Challenges ◽

10.3390/challe12010002 ◽

2021 ◽

Vol 12 (1) ◽

pp. 2

Author(s):

Tilman Klaeger ◽

Sebastian Gottschall ◽

Lukas Oehm

Keyword(s):

Machine Learning ◽

Data Collection ◽

Data Science ◽

Production Systems ◽

Ground Truth ◽

Data Sets ◽

Human Machine Interaction ◽

Automation Systems ◽

Machine Learning Applications ◽

Cyber Physical Production Systems

Much research is done on data analytics and machine learning for data coming from industrial processes. In practical approaches, one finds many pitfalls restraining the application of these modern technologies especially in brownfield applications. With this paper, we want to show state of the art and what to expect when working with stock machines in the field. The paper is a review of literature found to cover challenges for cyber-physical production systems (CPPS) in brownfield applications. This review is combined with our own personal experience and findings gained while setting up such systems in processing and packaging machines as well as in other areas. A major focus in this paper is on data collection, which tends be more cumbersome than most people might expect. In addition, data quality for machine learning applications is a challenge once leaving the laboratory and its academic data sets. Topics here include missing ground truth or the lack of semantic description of the data. A last challenge covered is IT security and passing data through firewalls to allow for the cyber part in CPPS. However, all of these findings show that potentials of data driven production systems are strongly depending on data collection to build proclaimed new automation systems with more flexibility, improved human–machine interaction and better process-stability and thus less waste during manufacturing.

Download Full-text

Reducing the Concepts of Data Science and Machine Learning to Tools for the Bench Chemist

CHIMIA International Journal for Chemistry ◽

10.2533/chimia.2019.1001 ◽

2019 ◽

Vol 73 (12) ◽

pp. 1001-1005

Author(s):

Richard A. Lewis ◽

Peter Ertl ◽

Nadine Schneider ◽

Nikolaus Stiefl

Keyword(s):

Machine Learning ◽

Data Science ◽

Data Sets ◽

Computational Power

Machine Learning and Data Science have enjoyed a renaissance due to the availability of increased computational power and larger data sets. Many questions can be now asked and answered, that previously were beyond our scope. This does not translate instantly into new tools that can be used by those not skilled in the field, as many of the issues and traps still exist. In this paper, we look at some of the new tools that we have created, and some of the difficulties that still need to be taken care of during the transition from a project run by an expert, to a tool for the bench chemist.

Download Full-text

Credit Card Fraud Detection using Machine Learning and Data Science

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37200 ◽

2021 ◽

Vol 9 (VII) ◽

pp. 3788-3792

Author(s):

Aman .

Keyword(s):

Machine Learning ◽

Credit Card ◽

Data Science ◽

Fraud Detection ◽

Data Sets ◽

Data Set ◽

Credit Card Fraud ◽

Detection Algorithms ◽

Processing Data ◽

Isolation Forest

It is important that companies are able to identify fraudulent credit card transactions so that customers are not charged for items that they did not purchase. These problems can be handled with Data Science and its importance, along with Machine Learning. This project aim is to illustrate the modelling of a data set using machine learning with Credit Card. Our objective is to detect 100% of the fraudulent transactions while minimizing the incorrect fraud classifications. Credit Card Fraud Detection is a sample of classification. In this process, we have focused on analysing and pre-processing data sets as well as the deployment of multiple anomaly detection algorithms such as Local Outlier Factor and Isolation Forest algorithm on the PCA transformed Credit Card Transaction data.

Download Full-text