scholarly journals Machine learning modelling of wet granulation scale-up using compressibility, compactibility and manufacturability parameters

2019 ◽  
Vol 73 (3) ◽  
pp. 155-168 ◽  
Author(s):  
Nada Millen ◽  
Aleksandar Kovacevic ◽  
Lalit Khera ◽  
Jelena Djuris ◽  
Svetlana Ibric

The purpose of this extensive study is to use a quality by design (QbD) approach and multiple machine learning algorithms in facilitating wet granulation process scale-up. This study investigated the extent of influence of both formulation and process variables. Furthermore, measured responses covered compressibility, compactibility and manufacturability of a powder blend. Finally, the models developed on laboratory scale samples were tested on pilot and commercial scale runs. Tablet detachment and ejection work were calculated from force-displacement measurements. Significant numerical and categorical input variables were identified by using a stepwise regression model and their importance evaluated by using a boosted trees model. Pilot scale runs resulted in the highest tablet tensile strength and compaction work as well as the highest detachment and ejection work. Critical quality attributes (CQAs) that were the most successfully predicted were the compaction, decompaction, and net work, as well as the tablet height. The most important input variable influencing all CQAs was the compaction force. Application of the boosted regression trees model resulted in the lowest Root Mean Square Error (RMSE) values for all of the responses. This work demonstrates reliability of predictions of developed models that can be successfully used as a part of a QbD approach for wet granulation scale-up.

2021 ◽  
Vol 11 (9) ◽  
pp. 4251
Author(s):  
Jinsong Zhang ◽  
Shuai Zhang ◽  
Jianhua Zhang ◽  
Zhiliang Wang

In the digital microfluidic experiments, the droplet characteristics and flow patterns are generally identified and predicted by the empirical methods, which are difficult to process a large amount of data mining. In addition, due to the existence of inevitable human invention, the inconsistent judgment standards make the comparison between different experiments cumbersome and almost impossible. In this paper, we tried to use machine learning to build algorithms that could automatically identify, judge, and predict flow patterns and droplet characteristics, so that the empirical judgment was transferred to be an intelligent process. The difference on the usual machine learning algorithms, a generalized variable system was introduced to describe the different geometry configurations of the digital microfluidics. Specifically, Buckingham’s theorem had been adopted to obtain multiple groups of dimensionless numbers as the input variables of machine learning algorithms. Through the verification of the algorithms, the SVM and BPNN algorithms had classified and predicted the different flow patterns and droplet characteristics (the length and frequency) successfully. By comparing with the primitive parameters system, the dimensionless numbers system was superior in the predictive capability. The traditional dimensionless numbers selected for the machine learning algorithms should have physical meanings strongly rather than mathematical meanings. The machine learning algorithms applying the dimensionless numbers had declined the dimensionality of the system and the amount of computation and not lose the information of primitive parameters.


2021 ◽  
Author(s):  
Thitaree Lertliangchai ◽  
Birol Dindoruk ◽  
Ligang Lu ◽  
Xi Yang

Abstract Dew point pressure (DPP) is a key variable that may be needed to predict the condensate to gas ratio behavior of a reservoir along with some production/completion related issues and calibrate/constrain the EOS models for integrated modeling. However, DPP is a challenging property in terms of its predictability. Recognizing the complexities, we present a state-of-the-art method for DPP prediction using advanced machine learning (ML) techniques. We compare the outcomes of our methodology with that of published empirical correlation-based approaches on two datasets with small sizes and different inputs. Our ML method noticeably outperforms the correlation-based predictors while also showing its flexibility and robustness even with small training datasets provided various classes of fluids are represented within the datasets. We have collected the condensate PVT data from public domain resources and GeoMark RFDBASE containing dew point pressure (the target variable), and the compositional data (mole percentage of each component), temperature, molecular weight (MW), MW and specific gravity (SG) of heptane plus as input variables. Using domain knowledge, before embarking the study, we have extensively checked the measurement quality and the outcomes using statistical techniques. We then apply advanced ML techniques to train predictive models with cross-validation to avoid overfitting the models to the small datasets. We compare our models against the best published DDP predictors with empirical correlation-based techniques. For fair comparisons, the correlation-based predictors are also trained using the underlying datasets. In order to improve the outcomes and using the generalized input data, pseudo-critical properties and artificial proxy features are also employed.


2020 ◽  
Vol 12 (17) ◽  
pp. 2742
Author(s):  
Ehsan Kamali Maskooni ◽  
Seyed Amir Naghibi ◽  
Hossein Hashemi ◽  
Ronny Berndtsson

Groundwater (GW) is being uncontrollably exploited in various parts of the world resulting from huge needs for water supply as an outcome of population growth and industrialization. Bearing in mind the importance of GW potential assessment in reaching sustainability, this study seeks to use remote sensing (RS)-derived driving factors as an input of the advanced machine learning algorithms (MLAs), comprising deep boosting and logistic model trees to evaluate their efficiency. To do so, their results are compared with three benchmark MLAs such as boosted regression trees, k-nearest neighbors, and random forest. For this purpose, we firstly assembled different topographical, hydrological, RS-based, and lithological driving factors such as altitude, slope degree, aspect, slope length, plan curvature, profile curvature, relative slope position, distance from rivers, river density, topographic wetness index, land use/land cover (LULC), normalized difference vegetation index (NDVI), distance from lineament, lineament density, and lithology. The GW spring indicator was divided into two classes for training (434 springs) and validation (186 springs) with a proportion of 70:30. The training dataset of the springs accompanied by the driving factors were incorporated into the MLAs and the outputs were validated by different indices such as accuracy, kappa, receiver operating characteristics (ROC) curve, specificity, and sensitivity. Based upon the area under the ROC curve, the logistic model tree (87.813%) generated similar performance to deep boosting (87.807%), followed by boosted regression trees (87.397%), random forest (86.466%), and k-nearest neighbors (76.708%) MLAs. The findings confirm the great performance of the logistic model tree and deep boosting algorithms in modelling GW potential. Thus, their application can be suggested for other areas to obtain an insight about GW-related barriers toward sustainability. Further, the outcome based on the logistic model tree algorithm depicts the high impact of the RS-based factor, such as NDVI with 100 relative influence, as well as high influence of the distance from river, altitude, and RSP variables with 46.07, 43.47, and 37.20 relative influence, respectively, on GW potential.


2010 ◽  
Vol 7 (10) ◽  
pp. 3311-3332 ◽  
Author(s):  
F. Oehler ◽  
J. C. Rutherford ◽  
G. Coco

Abstract. We propose to use machine learning (ML) algorithms to design a simplified denitrification model. Boosted regression trees (BRT) and artificial neural networks (ANN) were used to analyse the relationships and the relative influences of different input variables towards total denitrification, and an ANN was designed as a simplified model to simulate total nitrogen emissions from the denitrification process. To calibrate the BRT and ANN models and test this method, we used a database obtained collating datasets from the literature. We used bootstrapping to compute confidence intervals for the calibration and validation process. Both ML algorithms clearly outperformed a commonly used simplified model of nitrogen emissions, NEMIS, which is based on denitrification potential, temperature, soil water content and nitrate concentration. The ML models used soil organic matter % in place of a denitrification potential and pH as a fifth input variable. The BRT analysis reaffirms the importance of temperature, soil water content and nitrate concentration. Generalization, although limited to the data space of the database used to build the ML models, could be improved if pH is used to differentiate between soil types. Further improvements in model performance and generalization could be achieved by adding more data.


Water ◽  
2021 ◽  
Vol 13 (19) ◽  
pp. 2717
Author(s):  
Juan Mata ◽  
Fernando Salazar ◽  
José Barateiro ◽  
António Antunes

The main aim of structural safety control is the multiple assessments of the expected dam behaviour based on models and the measurements and parameters that characterise the dam’s response and condition. In recent years, there is an increase in the use of data-based models for the analysis and interpretation of the structural behaviour of dams. Multiple Linear Regression is the conventional, widely used approach in dam engineering, although interesting results have been published based on machine learning algorithms such as artificial neural networks, support vector machines, random forest, and boosted regression trees. However, these models need to be carefully developed and properly assessed before their application in practice. This is even more relevant when an increase in users of machine learning models is expected. For this reason, this paper presents extensive work regarding the verification and validation of data-based models for the analysis and interpretation of observed dam’s behaviour. This is presented by means of the development of several machine learning models to interpret horizontal displacements in an arch dam in operation. Several validation techniques are applied, including historical data validation, sensitivity analysis, and predictive validation. The results are discussed and conclusions are drawn regarding the practical application of data-based models.


Author(s):  
Aaron Rodrigues

Abstract: Food sales forecasting is concerned with predicting future sales of food-related businesses such as supermarkets, grocery stores, restaurants, bakeries, and patisseries. Companies can reduce stocked and expired products within stores while also avoiding missing revenues by using accurate short-term sales forecasting. This research examines current machine learning algorithms for predicting food purchases. It goes over key design considerations for a data analyst working on food sales forecasting’s, such as the temporal granularity of sales data, the input variables to employ for forecasting sales, and the representation of the sales output variable. It also examines machine learning algorithms that have been used to anticipate food sales and the proper metrics for assessing their performance. Finally, it goes over the major problems and prospects for applied machine learning in the field of food sales forecasting. Keywords: Food, Demand forecasting, Machine learning, Regression, Timeseries forecasting, Sales prediction


2020 ◽  
Vol 12 (23) ◽  
pp. 3976
Author(s):  
Nicholas Fiorentini ◽  
Mehdi Maboudi ◽  
Pietro Leandri ◽  
Massimo Losa ◽  
Markus Gerke

This paper introduces a methodology for predicting and mapping surface motion beneath road pavement structures caused by environmental factors. Persistent Scatterer Interferometric Synthetic Aperture Radar (PS-InSAR) measurements, geospatial analyses, and Machine Learning Algorithms (MLAs) are employed for achieving the purpose. Two single learners, i.e., Regression Tree (RT) and Support Vector Machine (SVM), and two ensemble learners, i.e., Boosted Regression Trees (BRT) and Random Forest (RF) are utilized for estimating the surface motion ratio in terms of mm/year over the Province of Pistoia (Tuscany Region, central Italy, 964 km2), in which strong subsidence phenomena have occurred. The interferometric process of 210 Sentinel-1 images from 2014 to 2019 allows exploiting the average displacements of 52,257 Persistent Scatterers as output targets to predict. A set of 29 environmental-related factors are preprocessed by SAGA-GIS, version 2.3.2, and ESRI ArcGIS, version 10.5, and employed as input features. Once the dataset has been prepared, three wrapper feature selection approaches (backward, forward, and bi-directional) are used for recognizing the set of most relevant features to be used in the modeling. A random splitting of the dataset in 70% and 30% is implemented to identify the training and test set. Through a Bayesian Optimization Algorithm (BOA) and a 10-Fold Cross-Validation (CV), the algorithms are trained and validated. Therefore, the Predictive Performance of MLAs is evaluated and compared by plotting the Taylor Diagram. Outcomes show that SVM and BRT are the most suitable algorithms; in the test phase, BRT has the highest Correlation Coefficient (0.96) and the lowest Root Mean Square Error (0.44 mm/year), while the SVM has the lowest difference between the standard deviation of its predictions (2.05 mm/year) and that of the reference samples (2.09 mm/year). Finally, algorithms are used for mapping surface motion over the study area. We propose three case studies on critical stretches of two-lane rural roads for evaluating the reliability of the procedure. Road authorities could consider the proposed methodology for their monitoring, management, and planning activities.


2017 ◽  
Vol 34 (10) ◽  
pp. 2329-2345 ◽  
Author(s):  
Bijoy Vengasseril Thampi ◽  
Takmeng Wong ◽  
Constantin Lukashin ◽  
Norman G. Loeb

AbstractContinuous monitoring of the earth radiation budget (ERB) is critical to the understanding of Earth’s climate and its variability with time. The Clouds and the Earth’s Radiant Energy System (CERES) instrument is able to provide a long record of ERB for such scientific studies. This manuscript, which is the first of a two-part paper, describes the new CERES algorithm for improving the clear/cloudy scene classification without the use of coincident cloud imager data. This new CERES algorithm is based on a subset of the modern artificial intelligence (AI) paradigm called machine learning (ML) algorithms. This paper describes the development and application of the ML algorithm known as random forests (RF), which is used to classify CERES broadband footprint measurements into clear and cloudy scenes. Results from the RF analysis carried using the CERES Single Scanner Footprint (SSF) data for January and July are presented in the manuscript. The daytime RF misclassification rate (MCR) shows relatively large values (>30%) for snow, sea ice, and bright desert surface types, while lower values (<10%) for the forest surface type. MCR values observed for the nighttime data in general show relatively larger values for most of the surface types compared to the daytime MCR values. The modified MCR values show lower values (<4%) for most surface types after thin cloud data are excluded from the analysis. Sensitivity analysis shows that the number of input variables and decision trees used in the RF analysis has a substantial influence on determining the classification error.


Author(s):  
Adrián G. Bruzón ◽  
Patricia Arrogante-Funes ◽  
Fátima Arrogante-Funes ◽  
Fidel Martín-González ◽  
Carlos J. Novillo ◽  
...  

The risks associated with landslides are increasing the personal losses and material damages in more and more areas of the world. These natural disasters are related to geological and extreme meteorological phenomena (e.g., earthquakes, hurricanes) occurring in regions that have already suffered similar previous natural catastrophes. Therefore, to effectively mitigate the landslide risks, new methodologies must better identify and understand all these landslide hazards through proper management. Within these methodologies, those based on assessing the landslide susceptibility increase the predictability of the areas where one of these disasters is most likely to occur. In the last years, much research has used machine learning algorithms to assess susceptibility using different sources of information, such as remote sensing data, spatial databases, or geological catalogues. This study presents the first attempt to develop a methodology based on an automatic machine learning (AutoML) framework. These frameworks are intended to facilitate the development of machine learning models, with the aim to enable researchers focus on data analysis. The area to test/validate this study is the center and southern region of Guerrero (Mexico), where we compare the performance of 16 machine learning algorithms. The best result achieved is the extra trees with an area under the curve (AUC) of 0.983. This methodology yields better results than other similar methods because using an AutoML framework allows to focus on the treatment of the data, to better understand input variables and to acquire greater knowledge about the processes involved in the landslides.


Sign in / Sign up

Export Citation Format

Share Document