Machine learning modelling of wet granulation scale-up using compressibility, compactibility and manufacturability parameters

Nada Millen; Aleksandar Kovacevic; Lalit Khera; Jelena Djuris; Svetlana Ibric

doi:10.2298/hemind190412017m

Machine learning modelling of wet granulation scale-up using compressibility, compactibility and manufacturability parameters

Hemijska industrija ◽

10.2298/hemind190412017m ◽

2019 ◽

Vol 73 (3) ◽

pp. 155-168 ◽

Cited By ~ 2

Author(s):

Nada Millen ◽

Aleksandar Kovacevic ◽

Lalit Khera ◽

Jelena Djuris ◽

Svetlana Ibric

Keyword(s):

Machine Learning ◽

Scale Up ◽

Pilot Scale ◽

Extensive Study ◽

Machine Learning Algorithms ◽

Boosted Regression Trees ◽

Wet Granulation ◽

Tablet Tensile Strength ◽

Process Scale Up ◽

Input Variables

The purpose of this extensive study is to use a quality by design (QbD) approach and multiple machine learning algorithms in facilitating wet granulation process scale-up. This study investigated the extent of influence of both formulation and process variables. Furthermore, measured responses covered compressibility, compactibility and manufacturability of a powder blend. Finally, the models developed on laboratory scale samples were tested on pilot and commercial scale runs. Tablet detachment and ejection work were calculated from force-displacement measurements. Significant numerical and categorical input variables were identified by using a stepwise regression model and their importance evaluated by using a boosted trees model. Pilot scale runs resulted in the highest tablet tensile strength and compaction work as well as the highest detachment and ejection work. Critical quality attributes (CQAs) that were the most successfully predicted were the compaction, decompaction, and net work, as well as the tablet height. The most important input variable influencing all CQAs was the compaction force. Application of the boosted regression trees model resulted in the lowest Root Mean Square Error (RMSE) values for all of the responses. This work demonstrates reliability of predictions of developed models that can be successfully used as a part of a QbD approach for wet granulation scale-up.

Download Full-text

Machine Learning Model of Dimensionless Numbers to Predict Flow Patterns and Droplet Characteristics for Two-Phase Digital Flows

Applied Sciences ◽

10.3390/app11094251 ◽

2021 ◽

Vol 11 (9) ◽

pp. 4251

Author(s):

Jinsong Zhang ◽

Shuai Zhang ◽

Jianhua Zhang ◽

Zhiliang Wang

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Digital Microfluidics ◽

Flow Patterns ◽

Machine Learning Algorithms ◽

Dimensionless Numbers ◽

Two Phase ◽

The Difference ◽

Input Variables ◽

Digital Microfluidic

In the digital microfluidic experiments, the droplet characteristics and flow patterns are generally identified and predicted by the empirical methods, which are difficult to process a large amount of data mining. In addition, due to the existence of inevitable human invention, the inconsistent judgment standards make the comparison between different experiments cumbersome and almost impossible. In this paper, we tried to use machine learning to build algorithms that could automatically identify, judge, and predict flow patterns and droplet characteristics, so that the empirical judgment was transferred to be an intelligent process. The difference on the usual machine learning algorithms, a generalized variable system was introduced to describe the different geometry configurations of the digital microfluidics. Specifically, Buckingham’s theorem had been adopted to obtain multiple groups of dimensionless numbers as the input variables of machine learning algorithms. Through the verification of the algorithms, the SVM and BPNN algorithms had classified and predicted the different flow patterns and droplet characteristics (the length and frequency) successfully. By comparing with the primitive parameters system, the dimensionless numbers system was superior in the predictive capability. The traditional dimensionless numbers selected for the machine learning algorithms should have physical meanings strongly rather than mathematical meanings. The machine learning algorithms applying the dimensionless numbers had declined the dimensionality of the system and the amount of computation and not lose the information of primitive parameters.

Download Full-text

A Comparative Analysis of the Prediction of Gas Condensate Dew Point Pressure Using Advanced Machine Learning Algorithms

10.2118/205997-ms ◽

2021 ◽

Author(s):

Thitaree Lertliangchai ◽

Birol Dindoruk ◽

Ligang Lu ◽

Xi Yang

Keyword(s):

Machine Learning ◽

Domain Knowledge ◽

Compositional Data ◽

Machine Learning Algorithms ◽

Empirical Correlation ◽

Dew Point ◽

Pvt Data ◽

Point Pressure ◽

Input Variables ◽

Dew Point Pressure

Abstract Dew point pressure (DPP) is a key variable that may be needed to predict the condensate to gas ratio behavior of a reservoir along with some production/completion related issues and calibrate/constrain the EOS models for integrated modeling. However, DPP is a challenging property in terms of its predictability. Recognizing the complexities, we present a state-of-the-art method for DPP prediction using advanced machine learning (ML) techniques. We compare the outcomes of our methodology with that of published empirical correlation-based approaches on two datasets with small sizes and different inputs. Our ML method noticeably outperforms the correlation-based predictors while also showing its flexibility and robustness even with small training datasets provided various classes of fluids are represented within the datasets. We have collected the condensate PVT data from public domain resources and GeoMark RFDBASE containing dew point pressure (the target variable), and the compositional data (mole percentage of each component), temperature, molecular weight (MW), MW and specific gravity (SG) of heptane plus as input variables. Using domain knowledge, before embarking the study, we have extensively checked the measurement quality and the outcomes using statistical techniques. We then apply advanced ML techniques to train predictive models with cross-validation to avoid overfitting the models to the small datasets. We compare our models against the best published DDP predictors with empirical correlation-based techniques. For fair comparisons, the correlation-based predictors are also trained using the underlying datasets. In order to improve the outcomes and using the generalized input data, pseudo-critical properties and artificial proxy features are also employed.

Download Full-text

Application of Advanced Machine Learning Algorithms to Assess Groundwater Potential Using Remote Sensing-Derived Data

Remote Sensing ◽

10.3390/rs12172742 ◽

2020 ◽

Vol 12 (17) ◽

pp. 2742

Author(s):

Ehsan Kamali Maskooni ◽

Seyed Amir Naghibi ◽

Hossein Hashemi ◽

Ronny Berndtsson

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Logistic Model ◽

Nearest Neighbors ◽

Driving Factors ◽

Machine Learning Algorithms ◽

Boosted Regression Trees ◽

K Nearest Neighbors ◽

Model Tree ◽

Logistic Model Tree

Groundwater (GW) is being uncontrollably exploited in various parts of the world resulting from huge needs for water supply as an outcome of population growth and industrialization. Bearing in mind the importance of GW potential assessment in reaching sustainability, this study seeks to use remote sensing (RS)-derived driving factors as an input of the advanced machine learning algorithms (MLAs), comprising deep boosting and logistic model trees to evaluate their efficiency. To do so, their results are compared with three benchmark MLAs such as boosted regression trees, k-nearest neighbors, and random forest. For this purpose, we firstly assembled different topographical, hydrological, RS-based, and lithological driving factors such as altitude, slope degree, aspect, slope length, plan curvature, profile curvature, relative slope position, distance from rivers, river density, topographic wetness index, land use/land cover (LULC), normalized difference vegetation index (NDVI), distance from lineament, lineament density, and lithology. The GW spring indicator was divided into two classes for training (434 springs) and validation (186 springs) with a proportion of 70:30. The training dataset of the springs accompanied by the driving factors were incorporated into the MLAs and the outputs were validated by different indices such as accuracy, kappa, receiver operating characteristics (ROC) curve, specificity, and sensitivity. Based upon the area under the ROC curve, the logistic model tree (87.813%) generated similar performance to deep boosting (87.807%), followed by boosted regression trees (87.397%), random forest (86.466%), and k-nearest neighbors (76.708%) MLAs. The findings confirm the great performance of the logistic model tree and deep boosting algorithms in modelling GW potential. Thus, their application can be suggested for other areas to obtain an insight about GW-related barriers toward sustainability. Further, the outcome based on the logistic model tree algorithm depicts the high impact of the RS-based factor, such as NDVI with 100 relative influence, as well as high influence of the distance from river, altitude, and RSP variables with 46.07, 43.47, and 37.20 relative influence, respectively, on GW potential.

Download Full-text

The use of machine learning algorithms to design a generalized simplified denitrification model

Biogeosciences ◽

10.5194/bg-7-3311-2010 ◽

2010 ◽

Vol 7 (10) ◽

pp. 3311-3332 ◽

Cited By ~ 7

Author(s):

F. Oehler ◽

J. C. Rutherford ◽

G. Coco

Keyword(s):

Machine Learning ◽

Water Content ◽

Soil Water ◽

Soil Water Content ◽

Nitrate Concentration ◽

Machine Learning Algorithms ◽

Boosted Regression Trees ◽

Simplified Model ◽

Nitrogen Emissions ◽

Denitrification Potential

Abstract. We propose to use machine learning (ML) algorithms to design a simplified denitrification model. Boosted regression trees (BRT) and artificial neural networks (ANN) were used to analyse the relationships and the relative influences of different input variables towards total denitrification, and an ANN was designed as a simplified model to simulate total nitrogen emissions from the denitrification process. To calibrate the BRT and ANN models and test this method, we used a database obtained collating datasets from the literature. We used bootstrapping to compute confidence intervals for the calibration and validation process. Both ML algorithms clearly outperformed a commonly used simplified model of nitrogen emissions, NEMIS, which is based on denitrification potential, temperature, soil water content and nitrate concentration. The ML models used soil organic matter % in place of a denitrification potential and pH as a fifth input variable. The BRT analysis reaffirms the importance of temperature, soil water content and nitrate concentration. Generalization, although limited to the data space of the database used to build the ML models, could be improved if pH is used to differentiate between soil types. Further improvements in model performance and generalization could be achieved by adding more data.

Download Full-text

Formulation Development and Process Scale Up of a High Shear Wet Granulation Formulation Containing a Poorly Wettable Drug

Journal of Pharmaceutical Sciences ◽

10.1002/jps.21410 ◽

2008 ◽

Vol 97 (12) ◽

pp. 5274-5289 ◽

Cited By ~ 13

Author(s):

Xiaorong He ◽

Keith A. Lunday ◽

Liang‐chi Li ◽

Mark J. Sacchetti

Keyword(s):

Scale Up ◽

Wet Granulation ◽

High Shear ◽

Formulation Development ◽

Process Scale Up ◽

Process Scale ◽

High Shear Wet Granulation

Download Full-text

Validation of Machine Learning Models for Structural Dam Behaviour Interpretation and Prediction

Water ◽

10.3390/w13192717 ◽

2021 ◽

Vol 13 (19) ◽

pp. 2717

Author(s):

Juan Mata ◽

Fernando Salazar ◽

José Barateiro ◽

António Antunes

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Arch Dam ◽

Boosted Regression Trees ◽

Structural Safety ◽

Support Vector ◽

Learning Models ◽

Structural Behaviour ◽

Safety Control ◽

Machine Learning Models

The main aim of structural safety control is the multiple assessments of the expected dam behaviour based on models and the measurements and parameters that characterise the dam’s response and condition. In recent years, there is an increase in the use of data-based models for the analysis and interpretation of the structural behaviour of dams. Multiple Linear Regression is the conventional, widely used approach in dam engineering, although interesting results have been published based on machine learning algorithms such as artificial neural networks, support vector machines, random forest, and boosted regression trees. However, these models need to be carefully developed and properly assessed before their application in practice. This is even more relevant when an increase in users of machine learning models is expected. For this reason, this paper presents extensive work regarding the verification and validation of data-based models for the analysis and interpretation of observed dam’s behaviour. This is presented by means of the development of several machine learning models to interpret horizontal displacements in an arch dam in operation. Several validation techniques are applied, including historical data validation, sensitivity analysis, and predictive validation. The results are discussed and conclusions are drawn regarding the practical application of data-based models.

Download Full-text

Food Sales Forecasting Using Machine Learning Techniques: A Survey

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38069 ◽

2021 ◽

Vol 9 (9) ◽

pp. 869-872

Author(s):

Aaron Rodrigues

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Demand Forecasting ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Grocery Stores ◽

Data Analyst ◽

Sales Prediction ◽

Input Variables ◽

Food Sales

Abstract: Food sales forecasting is concerned with predicting future sales of food-related businesses such as supermarkets, grocery stores, restaurants, bakeries, and patisseries. Companies can reduce stocked and expired products within stores while also avoiding missing revenues by using accurate short-term sales forecasting. This research examines current machine learning algorithms for predicting food purchases. It goes over key design considerations for a data analyst working on food sales forecasting’s, such as the temporal granularity of sales data, the input variables to employ for forecasting sales, and the representation of the sales output variable. It also examines machine learning algorithms that have been used to anticipate food sales and the proper metrics for assessing their performance. Finally, it goes over the major problems and prospects for applied machine learning in the field of food sales forecasting. Keywords: Food, Demand forecasting, Machine learning, Regression, Timeseries forecasting, Sales prediction

Download Full-text

Surface Motion Prediction and Mapping for Road Infrastructures Management by PS-InSAR Measurements and Machine Learning Algorithms

Remote Sensing ◽

10.3390/rs12233976 ◽

2020 ◽

Vol 12 (23) ◽

pp. 3976

Author(s):

Nicholas Fiorentini ◽

Mehdi Maboudi ◽

Pietro Leandri ◽

Massimo Losa ◽

Markus Gerke

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Central Italy ◽

Machine Learning Algorithms ◽

Boosted Regression Trees ◽

Bayesian Optimization ◽

Support Vector ◽

Surface Motion ◽

Persistent Scatterers ◽

Taylor Diagram

This paper introduces a methodology for predicting and mapping surface motion beneath road pavement structures caused by environmental factors. Persistent Scatterer Interferometric Synthetic Aperture Radar (PS-InSAR) measurements, geospatial analyses, and Machine Learning Algorithms (MLAs) are employed for achieving the purpose. Two single learners, i.e., Regression Tree (RT) and Support Vector Machine (SVM), and two ensemble learners, i.e., Boosted Regression Trees (BRT) and Random Forest (RF) are utilized for estimating the surface motion ratio in terms of mm/year over the Province of Pistoia (Tuscany Region, central Italy, 964 km2), in which strong subsidence phenomena have occurred. The interferometric process of 210 Sentinel-1 images from 2014 to 2019 allows exploiting the average displacements of 52,257 Persistent Scatterers as output targets to predict. A set of 29 environmental-related factors are preprocessed by SAGA-GIS, version 2.3.2, and ESRI ArcGIS, version 10.5, and employed as input features. Once the dataset has been prepared, three wrapper feature selection approaches (backward, forward, and bi-directional) are used for recognizing the set of most relevant features to be used in the modeling. A random splitting of the dataset in 70% and 30% is implemented to identify the training and test set. Through a Bayesian Optimization Algorithm (BOA) and a 10-Fold Cross-Validation (CV), the algorithms are trained and validated. Therefore, the Predictive Performance of MLAs is evaluated and compared by plotting the Taylor Diagram. Outcomes show that SVM and BRT are the most suitable algorithms; in the test phase, BRT has the highest Correlation Coefficient (0.96) and the lowest Root Mean Square Error (0.44 mm/year), while the SVM has the lowest difference between the standard deviation of its predictions (2.05 mm/year) and that of the reference samples (2.09 mm/year). Finally, algorithms are used for mapping surface motion over the study area. We propose three case studies on critical stretches of two-lane rural roads for evaluating the reliability of the procedure. Road authorities could consider the proposed methodology for their monitoring, management, and planning activities.

Download Full-text

Determination of CERES TOA Fluxes Using Machine Learning Algorithms. Part I: Classification and Retrieval of CERES Cloudy and Clear Scenes

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-16-0183.1 ◽

2017 ◽

Vol 34 (10) ◽

pp. 2329-2345 ◽

Cited By ~ 3

Author(s):

Bijoy Vengasseril Thampi ◽

Takmeng Wong ◽

Constantin Lukashin ◽

Norman G. Loeb

Keyword(s):

Machine Learning ◽

Radiant Energy ◽

Energy System ◽

Machine Learning Algorithms ◽

Radiation Budget ◽

Misclassification Rate ◽

Classification Error ◽

Cloud Data ◽

Input Variables ◽

Earth’S Climate

AbstractContinuous monitoring of the earth radiation budget (ERB) is critical to the understanding of Earth’s climate and its variability with time. The Clouds and the Earth’s Radiant Energy System (CERES) instrument is able to provide a long record of ERB for such scientific studies. This manuscript, which is the first of a two-part paper, describes the new CERES algorithm for improving the clear/cloudy scene classification without the use of coincident cloud imager data. This new CERES algorithm is based on a subset of the modern artificial intelligence (AI) paradigm called machine learning (ML) algorithms. This paper describes the development and application of the ML algorithm known as random forests (RF), which is used to classify CERES broadband footprint measurements into clear and cloudy scenes. Results from the RF analysis carried using the CERES Single Scanner Footprint (SSF) data for January and July are presented in the manuscript. The daytime RF misclassification rate (MCR) shows relatively large values (>30%) for snow, sea ice, and bright desert surface types, while lower values (<10%) for the forest surface type. MCR values observed for the nighttime data in general show relatively larger values for most of the surface types compared to the daytime MCR values. The modified MCR values show lower values (<4%) for most surface types after thin cloud data are excluded from the analysis. Sensitivity analysis shows that the number of input variables and decision trees used in the RF analysis has a substantial influence on determining the classification error.

Download Full-text

Landslide Susceptibility Assessment Using an AutoML Framework

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph182010971 ◽

2021 ◽

Vol 18 (20) ◽

pp. 10971

Author(s):

Adrián G. Bruzón ◽

Patricia Arrogante-Funes ◽

Fátima Arrogante-Funes ◽

Fidel Martín-González ◽

Carlos J. Novillo ◽

...

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Spatial Databases ◽

Learning Algorithms ◽

Area Under The Curve ◽

Remote Sensing Data ◽

Machine Learning Algorithms ◽

Sources Of Information ◽

Natural Catastrophes ◽

Input Variables

The risks associated with landslides are increasing the personal losses and material damages in more and more areas of the world. These natural disasters are related to geological and extreme meteorological phenomena (e.g., earthquakes, hurricanes) occurring in regions that have already suffered similar previous natural catastrophes. Therefore, to effectively mitigate the landslide risks, new methodologies must better identify and understand all these landslide hazards through proper management. Within these methodologies, those based on assessing the landslide susceptibility increase the predictability of the areas where one of these disasters is most likely to occur. In the last years, much research has used machine learning algorithms to assess susceptibility using different sources of information, such as remote sensing data, spatial databases, or geological catalogues. This study presents the first attempt to develop a methodology based on an automatic machine learning (AutoML) framework. These frameworks are intended to facilitate the development of machine learning models, with the aim to enable researchers focus on data analysis. The area to test/validate this study is the center and southern region of Guerrero (Mexico), where we compare the performance of 16 machine learning algorithms. The best result achieved is the extra trees with an area under the curve (AUC) of 0.983. This methodology yields better results than other similar methods because using an AutoML framework allows to focus on the treatment of the data, to better understand input variables and to acquire greater knowledge about the processes involved in the landslides.

Download Full-text