scholarly journals Low-cost scalable discretization, prediction, and feature selection for complex systems

2020 ◽  
Vol 6 (5) ◽  
pp. eaaw0961 ◽  
Author(s):  
S. Gerber ◽  
L. Pospisil ◽  
M. Navandar ◽  
I. Horenko

Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (e.g., the very popular K-means clustering) are crucially limited in terms of quality, parallelizability, and cost. We introduce a low-cost improved quality scalable probabilistic approximation (SPA) algorithm, allowing for simultaneous data-driven optimal discretization, feature selection, and prediction. We prove its optimality, parallel efficiency, and a linear scalability of iteration cost. Cross-validated applications of SPA to a range of large realistic data classification and prediction problems reveal marked cost and performance improvements. For example, SPA allows the data-driven next-day predictions of resimulated surface temperatures for Europe with the mean prediction error of 0.75°C on a common PC (being around 40% better in terms of errors and five to six orders of magnitude cheaper than with common computational instruments used by the weather services).

2019 ◽  
Author(s):  
S. Gerber ◽  
L. Pospisil ◽  
M. Navandar ◽  
I. Horenko

AbstractFinding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (for example, the very popular K-means clustering) are crucially limited in terms of quality and cost. We introduce a low-cost improved-quality Scalable Probabilistic Approximation (SPA) algorithm, allowing for simultaneous data-driven optimal discretization, feature selection and prediction. Cross-validated applications of SPA to a range of large realistic data classification and prediction problems reveal drastic cost and performance improvements. For example, SPA allows the unsupervised next-day surface temperature predictions for Europe with the mean crossvalidated one-day prediction error of 0.75°C on a common PC (being around 40% better in terms of errors and five to six orders-of-magnitude cheaper than the next-day surface temperature predictions calculated on supercomputers and provided by the weather services).One Sentence SummaryIntroduced computational tool allows obtaining drastic cost and quality gains for a broad range of science applications.


2005 ◽  
Vol 36 (1) ◽  
pp. 646 ◽  
Author(s):  
Hiap L. Ong ◽  
Ngwe Cheong ◽  
Jason Lo ◽  
Marty Metras ◽  
Ollie Woodard ◽  
...  

Life ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1165
Author(s):  
Kristýna Mezerová ◽  
Lubomír Starý ◽  
Pavel Zbořil ◽  
Ivo Klementa ◽  
Martin Stašek ◽  
...  

The frequent occurrence of E. coli positive for cyclomodulins such as colibactin (CLB), the cytotoxic necrotizing factor (CNF), and the cytolethal distending factor (CDT) in colorectal cancer (CRC) patients published so far provides the opportunity to use them as CRC screening markers. We examined the practicability and performance of a low-cost detection approach that relied on culture followed by simplified DNA extraction and PCR in E. coli isolates recovered from 130 CRC patients and 111 controls. Our results showed a statistically significant association between CRC and the presence of colibactin genes clbB and clbN, the cnf gene, and newly, the hemolytic phenotype of E. coli isolates. We also observed a significant increase in the mean number of morphologically distinct E. coli isolates per patient in the CRC cohort compared to controls, indicating that the cyclomodulin-producing E. coli strains may represent potentially preventable harmful newcomers in CRC patients. A colibactin gene assay showed the highest detection rate (45.4%), and males would benefit from the screening more than females. However, because of the high number of false positives, practical use of this marker must be explored. In our opinion, it may serve as an auxiliary marker to increase the specificity and/or sensitivity of the well-established fecal immunochemical test (FIT) in CRC screening.


Author(s):  
Patricio S Dalton ◽  
Julius Rüschenpöhler ◽  
Burak Uras ◽  
Bilal Zia

Abstract Business practices and performance vary widely across businesses within the same sector. A key outstanding question is why profitable practices do not readily diffuse. We conduct a field experiment among urban retailers in Indonesia to study whether alleviating informational and behavioral frictions can facilitate such diffusion in a cost-effective manner. Through quantitative and qualitative fieldwork, we curate a handbook that associates locally relevant practices with performance, and provides idiosyncratic implementation guidance informed by exemplary local retailers. We complement this handbook with two light-touch interventions to facilitate behavior change. A subset of retailers is invited to a documentary movie screening featuring the paths to success of exemplary peers. Another subset is offered two 30 minute personal visits by a local facilitator. A third group is offered both. Eighteen months later, we find significant impacts on practice adoption when the handbook is coupled with the two behavioral nudges, and up to a 35% increase in profits and 16.7% increase in sales. These findings suggest both informational and behavioral constraints are at play. The types of practices adopted map the performance improvements to efficiency gains rather than other channels. A simple cost-benefit analysis shows such locally relevant knowledge can be codified and scaled successfully at relatively low cost.


2000 ◽  
Author(s):  
Ronald H. Miller ◽  
Gary S. Strumolo ◽  
Carlos Leon

Abstract In order to compete successfully in the global market place, manufactures must have the ability to produce high-quality, low-cost products that fully satisfy the customer’s needs. Many analytical techniques have been adopted by these manufactures aimed at improving customer and engineering quality. Nonlinear multivariate systems, however, complicate this process making the determination of controlling factors difficult. Often times, improvements in one area of the process or product compromises the performance in other areas. The Design of Experiments (DOE), pioneered by Taguchi, represents a powerful statistical method to help better understand nonlinear systems with the aim to improve quality and performance in engineering. A DOE using Computational Fluid Dynamics (CFD) is developed to better understand the influences of flow forces on valve design and performance. Geometric control factors for the spool valve are determined, enabling optimization and performance improvements.


Sensors ◽  
2021 ◽  
Vol 21 (20) ◽  
pp. 6841
Author(s):  
Sergio Cofre-Martel ◽  
Enrique Lopez Droguett ◽  
Mohammad Modarres

Sensor monitoring networks and advances in big data analytics have guided the reliability engineering landscape to a new era of big machinery data. Low-cost sensors, along with the evolution of the internet of things and industry 4.0, have resulted in rich databases that can be analyzed through prognostics and health management (PHM) frameworks. Several data-driven models (DDMs) have been proposed and applied for diagnostics and prognostics purposes in complex systems. However, many of these models are developed using simulated or experimental data sets, and there is still a knowledge gap for applications in real operating systems. Furthermore, little attention has been given to the required data preprocessing steps compared to the training processes of these DDMs. Up to date, research works do not follow a formal and consistent data preprocessing guideline for PHM applications. This paper presents a comprehensive step-by-step pipeline for the preprocessing of monitoring data from complex systems aimed for DDMs. The importance of expert knowledge is discussed in the context of data selection and label generation. Two case studies are presented for validation, with the end goal of creating clean data sets with healthy and unhealthy labels that are then used to train machinery health state classifiers.


Author(s):  
M. A. Adesokan ◽  
K. O. Oriola ◽  
B. A. Ogundeji ◽  
O. W. Muhammed-Bashir

An easy-to-operate maize dehusker-sheller machine was constructed from locally available materials with relatively low cost at the premises of DAF Technical Services, Ilorin, Nigeria, between June, 2017 and February, 2018. The construction of the machine was carried out by sizing and marking out of the plate using scriber and cutter. The shaft was smoothened with sand paper and various components were welded; assembling of parts was done by fastener (bolts & nuts). The machine consisted of four units (feeding unit, dehusking – shelling unit, cleaning unit and outlets). Results obtained indicated a mean de - husking efficiency of 58.67%, 57.00%, 54.16 at speed 469 rpm, 309 rpm and 298 rpm respectively. The mean shelling efficiencies were 73.36%, 71.53%, 65 55% at 469 rpm to 298 rpm. And mean through put capacity of 55.90 kg/hr, 41.10 kg/hr and 36.00 kg/hr at speed stated above. Also the mean cleaning efficiencies were 79.97%, 79.77%, 82.23%  at speed  469 rpm, 309 rpm and 298 rpm respectively. The mean grain losses were 20.37%, 21.20% and 17.16% using the three speeds stated above. In conclusion, the mean dehusking efficiency, shelling and mean through put capacity performed best at 469 rpm while mean cleaning efficiency and mean grain loss was best at speed 298 rpm.


2019 ◽  
Vol 59 (2) ◽  
pp. 314
Author(s):  
J. K. Nyameasem ◽  
M. Akoloh ◽  
E. K. Adu

The potential of grasscutters (Thryonomys swinderianus) as a source of animal protein can be exploited with better understanding of its nutrient requirement. This experiment was conducted to determine the protein requirement of growing grasscutters fed formulated diets containing forage meal. Twenty-four growing grasscutters, in groups of four, were randomly allotted to four treatment diets formulated to respectively supply 14, 16, 18 and 20% crude protein (CP). Parameters measured included daily feed intake, daily weight gain (growth rate), final bodyweight, feed conversion ratio and cost-to-gain ratio. Dietary protein significantly (P < 0.05) influenced daily weight gain, as well as the final liveweights of the animals. The mean daily weight gain of the animals fed the 18% CP diet was not significantly (P > 0.05) different from those fed the 20% CP diet (12.8 vs 11.7 g/day), but was significantly higher than animals fed the 16% (6.4 g/day) and 14% (7.0 g/day) CP diets. The mean feed conversion ratios of the animals fed the diets with 18% CP (4.1) was, however, only significantly (P < 0.05) different from animals fed diets with 16% (7.2) and 14% (6.3) CP. Given the overall economic importance of low cost-to-gain ratio, and the profitability of the diets thereof, these results suggest that 18% is the optimum CP level for economically feeding growing grasscutters on formulated diets containing forage meal.


2000 ◽  
Vol 16 (2) ◽  
pp. 107-114 ◽  
Author(s):  
Louis M. Hsu ◽  
Judy Hayman ◽  
Judith Koch ◽  
Debbie Mandell

Summary: In the United States' normative population for the WAIS-R, differences (Ds) between persons' verbal and performance IQs (VIQs and PIQs) tend to increase with an increase in full scale IQs (FSIQs). This suggests that norm-referenced interpretations of Ds should take FSIQs into account. Two new graphs are presented to facilitate this type of interpretation. One of these graphs estimates the mean of absolute values of D (called typical D) at each FSIQ level of the US normative population. The other graph estimates the absolute value of D that is exceeded only 5% of the time (called abnormal D) at each FSIQ level of this population. A graph for the identification of conventional “statistically significant Ds” (also called “reliable Ds”) is also presented. A reliable D is defined in the context of classical true score theory as an absolute D that is unlikely (p < .05) to be exceeded by a person whose true VIQ and PIQ are equal. As conventionally defined reliable Ds do not depend on the FSIQ. The graphs of typical and abnormal Ds are based on quadratic models of the relation of sizes of Ds to FSIQs. These models are generalizations of models described in Hsu (1996) . The new graphical method of identifying Abnormal Ds is compared to the conventional Payne-Jones method of identifying these Ds. Implications of the three juxtaposed graphs for the interpretation of VIQ-PIQ differences are discussed.


Sign in / Sign up

Export Citation Format

Share Document