scholarly journals Provenance-and machine learning-based recommendation of parameter values in scientific workflows

2021 ◽  
Vol 7 ◽  
pp. e606
Author(s):  
Daniel Silva Junior ◽  
Esther Pacitti ◽  
Aline Paes ◽  
Daniel de Oliveira

Scientific Workflows (SWfs) have revolutionized how scientists in various domains of science conduct their experiments. The management of SWfs is performed by complex tools that provide support for workflow composition, monitoring, execution, capturing, and storage of the data generated during execution. In some cases, they also provide components to ease the visualization and analysis of the generated data. During the workflow’s composition phase, programs must be selected to perform the activities defined in the workflow specification. These programs often require additional parameters that serve to adjust the program’s behavior according to the experiment’s goals. Consequently, workflows commonly have many parameters to be manually configured, encompassing even more than one hundred in many cases. Wrongly parameters’ values choosing can lead to crash workflows executions or provide undesired results. As the execution of data- and compute-intensive workflows is commonly performed in a high-performance computing environment e.g., (a cluster, a supercomputer, or a public cloud), an unsuccessful execution configures a waste of time and resources. In this article, we present FReeP—Feature Recommender from Preferences, a parameter value recommendation method that is designed to suggest values for workflow parameters, taking into account past user preferences. FReeP is based on Machine Learning techniques, particularly in Preference Learning. FReeP is composed of three algorithms, where two of them aim at recommending the value for one parameter at a time, and the third makes recommendations for n parameters at once. The experimental results obtained with provenance data from two broadly used workflows showed FReeP usefulness in the recommendation of values for one parameter. Furthermore, the results indicate the potential of FReeP to recommend values for n parameters in scientific workflows.

10.6036/10007 ◽  
2021 ◽  
Vol 96 (5) ◽  
pp. 528-533
Author(s):  
XAVIER LARRIVA NOVO ◽  
MARIO VEGA BARBAS ◽  
VICTOR VILLAGRA ◽  
JULIO BERROCAL

Cybersecurity has stood out in recent years with the aim of protecting information systems. Different methods, techniques and tools have been used to make the most of the existing vulnerabilities in these systems. Therefore, it is essential to develop and improve new technologies, as well as intrusion detection systems that allow detecting possible threats. However, the use of these technologies requires highly qualified cybersecurity personnel to analyze the results and reduce the large number of false positives that these technologies presents in their results. Therefore, this generates the need to research and develop new high-performance cybersecurity systems that allow efficient analysis and resolution of these results. This research presents the application of machine learning techniques to classify real traffic, in order to identify possible attacks. The study has been carried out using machine learning tools applying deep learning algorithms such as multi-layer perceptron and long-short-term-memory. Additionally, this document presents a comparison between the results obtained by applying the aforementioned algorithms and algorithms that are not deep learning, such as: random forest and decision tree. Finally, the results obtained are presented, showing that the long-short-term-memory algorithm is the one that provides the best results in relation to precision and logarithmic loss.


Proceedings ◽  
2020 ◽  
Vol 54 (1) ◽  
pp. 8
Author(s):  
Julio J. Estévez-Pereira ◽  
Diego Fernández ◽  
Francisco J. Novoa

While traditional network security methods have been proven useful until now, the flexibility of machine learning techniques makes them a solid candidate in the current scene of our networks. In this paper, we assess how well the latter are capable of detecting security threats in a corporative network. To that end, we configure and compare several models to find the one which fits better with our needs. Furthermore, we distribute the computational load and storage so we can handle extensive volumes of data. The algorithms that we use to create our models, Random Forest, Naive Bayes, and Deep Neural Networks (DNN), are both divergent and tested in other papers in order to make our comparison richer. For the distribution phase, we operate with Apache Structured Streaming, PySpark, and MLlib. As for the results, it is relevant to mention that our dataset has been found to be effectively modelable with just a reduced number of features. Finally, given the outcomes obtained, we find this line of research encouraging and, therefore, this approach worth pursuing.


Machine learning techniques with high performance computing technologies can create various new opportunities in the agriculture domain. This paper does comprehensivereview of various papers which are concentrating on machine learning (ML) and deep learning application in agriculture. This paper is categorized into three sections a) Yield prediction using machine learning technique b) Price prediction c) Leaf disease detection using neural networks. In this paper we study the comparison of neural network models with existing models. The findings of this survey paper indicate Deep learning models give high accuracy and outperform traditional image processing technique and ML techniques outperforms various traditional techniques in prediction.


2017 ◽  
Vol 2017 ◽  
pp. 1-12 ◽  
Author(s):  
Yang Liu ◽  
Youbo Liu ◽  
Junyong Liu ◽  
Maozhen Li ◽  
Tingjian Liu ◽  
...  

Transient stability assessment is playing a vital role in modern power systems. For this purpose, machine learning techniques have been widely employed to find critical conditions and recognize transient behaviors based on massive data analysis. However, an ever increasing volume of data generated from power systems poses a number of challenges to traditional machine learning techniques, which are computationally intensive running on standalone computers. This paper presents a MapReduce based high performance neural network to enable fast stability assessment of power systems. Hadoop, which is an open-source implementation of the MapReduce model, is first employed to parallelize the neural network. The parallel neural network is further enhanced with HaLoop to reduce the computation overhead incurred in the iteration process of the neural network. In addition, ensemble techniques are employed to accommodate the accuracy loss of the parallelized neural network in classification. The parallelized neural network is evaluated with both the IEEE 68-node system and a real power system from the aspects of computation speedup and stability assessment.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1186
Author(s):  
Hochong Park ◽  
Joo-Hiuk Son

Terahertz imaging and time-domain spectroscopy have been widely used to characterize the properties of test samples in various biomedical and engineering fields. Many of these tasks require the analysis of acquired terahertz signals to extract embedded information, which can be achieved using machine learning. Recently, machine learning techniques have developed rapidly, and many new learning models and learning algorithms have been investigated. Therefore, combined with state-of-the-art machine learning techniques, terahertz applications can be performed with high performance that cannot be achieved using modeling techniques that precede the machine learning era. In this review, we introduce the concept of machine learning and basic machine learning techniques and examine the methods for performance evaluation. We then summarize representative examples of terahertz imaging and time-domain spectroscopy that are conducted using machine learning.


Materials ◽  
2021 ◽  
Vol 14 (22) ◽  
pp. 7034
Author(s):  
Yue Xu ◽  
Waqas Ahmad ◽  
Ayaz Ahmad ◽  
Krzysztof Adam Ostrowski ◽  
Marta Dudek ◽  
...  

The current trend in modern research revolves around novel techniques that can predict the characteristics of materials without consuming time, effort, and experimental costs. The adaptation of machine learning techniques to compute the various properties of materials is gaining more attention. This study aims to use both standalone and ensemble machine learning techniques to forecast the 28-day compressive strength of high-performance concrete. One standalone technique (support vector regression (SVR)) and two ensemble techniques (AdaBoost and random forest) were applied for this purpose. To validate the performance of each technique, coefficient of determination (R2), statistical, and k-fold cross-validation checks were used. Additionally, the contribution of input parameters towards the prediction of results was determined by applying sensitivity analysis. It was proven that all the techniques employed showed improved performance in predicting the outcomes. The random forest model was the most accurate, with an R2 value of 0.93, compared to the support vector regression and AdaBoost models, with R2 values of 0.83 and 0.90, respectively. In addition, statistical and k-fold cross-validation checks validated the random forest model as the best performer based on lower error values. However, the prediction performance of the support vector regression and AdaBoost models was also within an acceptable range. This shows that novel machine learning techniques can be used to predict the mechanical properties of high-performance concrete.


Author(s):  
Lisanne V. van Dijk ◽  
Clifton D. Fuller

The advent of large-scale high-performance computing has allowed the development of machine-learning techniques in oncologic applications. Among these, there has been substantial growth in radiomics (machine-learning texture analysis of images) and artificial intelligence (which uses deep-learning techniques for “learning algorithms”); however, clinical implementation has yet to be realized at scale. To improve implementation, opportunities, mechanics, and challenges, models of imaging-enabled artificial intelligence approaches need to be understood by clinicians who make the treatment decisions. This article aims to convey the basic conceptual premises of radiomics and artificial intelligence using head and neck cancer as a use case. This educational overview focuses on approaches for head and neck oncology imaging, detailing current research efforts and challenges to implementation.


Sign in / Sign up

Export Citation Format

Share Document