Energy Efficiency of Inference Algorithms for Medical Datasets: A Green AI study (Preprint)

2021 ◽  
Author(s):  
Jia-Ruei Yu ◽  
Chun-Hsien Chen ◽  
Tsung-Wei Huang ◽  
Jang-Jih Lu ◽  
Chia-Ru Chung ◽  
...  

BACKGROUND Harnessing artificial intelligence (AI) in medical domain has raised considerable interest recently. An AI model must be energy-efficient if it has to be used for inference applications in medical domain. Different from other type of data in visual AI, data in medical domain are usually composed of features with strong signals. Numerous energy optimization techniques have been developed to relieve the burden on the hardware required to deploy a complex learning model. However, the energy efficiencies of different AI models used for medical applications have not yet been studied. OBJECTIVE To explore and compare the energy efficiencies of widely-used machine learning (ML) algorithms, including logistic regression (LR), k-nearest neighbors (kNN), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGB), and two different neural networks (NN) in the medical datasets. METHODS We applied the algorithms above to two distinct medical datasets, the mass spectrometry data of Staphylococcus aureus for predicting methicillin-resistance (“Mass spectrometry” dataset: 3338 cases; 268 features), and the urinalysis data for predicting Trichomonas vaginalis infection (“Urinalysis” dataset: 839,164 cases; 9 features). We compared the performance among these seven inference algorithms across accuracy, area under the receiver operating characteristic (AUROC), time consumption, and power consumption. The time and power consumptions were determined using the performance counter data from Intel Power Gadget 3.5. RESULTS Experimental results showed that the RF and XGB algorithms achieved the two highest AUROC scores with both datasets (84.7% and 83.9% with the “Mass spectrometry” dataset, respectively, and 91.1% and 91.4% with the “Urinalysis” dataset, respectively). In terms of time consumption, the XGB, 1-hidden-layer NN and LR algorithms exhibit the lowest time consumption with both datasets. RF as the referral baseline, XGB, 1-hidden-layer NN and LR achieved 45% reduction of inferencing time with the “Mass spectrometry” dataset, and 53-60% reduction with the “Urinalysis” dataset, respectively. In terms of energy efficiency, XGB, LR, SVM and RF consumed the least power. 5-hidden-layer NN as the referral baseline, XGB, LR, SVM and RF achieved 24-32% reduction of power consumption with the “Mass spectrometry” dataset, and 20-53% reduction with the “Urinalysis” dataset, respectively. Among all experiments, XGB achieved the best performance across accuracy, runtime, and energy efficiency. CONCLUSIONS In current study, XGB attained a balanced performance across accuracy, runtime, and energy efficiency in the medical datasets. The research results indicate that the XGB would be an ideal algorithm for applying ML to real-world medical scenarios.

Energies ◽  
2021 ◽  
Vol 14 (14) ◽  
pp. 4089
Author(s):  
Kaiqiang Zhang ◽  
Dongyang Ou ◽  
Congfeng Jiang ◽  
Yeliang Qiu ◽  
Longchuan Yan

In terms of power and energy consumption, DRAMs play a key role in a modern server system as well as processors. Although power-aware scheduling is based on the proportion of energy between DRAM and other components, when running memory-intensive applications, the energy consumption of the whole server system will be significantly affected by the non-energy proportion of DRAM. Furthermore, modern servers usually use NUMA architecture to replace the original SMP architecture to increase its memory bandwidth. It is of great significance to study the energy efficiency of these two different memory architectures. Therefore, in order to explore the power consumption characteristics of servers under memory-intensive workload, this paper evaluates the power consumption and performance of memory-intensive applications in different generations of real rack servers. Through analysis, we find that: (1) Workload intensity and concurrent execution threads affects server power consumption, but a fully utilized memory system may not necessarily bring good energy efficiency indicators. (2) Even if the memory system is not fully utilized, the memory capacity of each processor core has a significant impact on application performance and server power consumption. (3) When running memory-intensive applications, memory utilization is not always a good indicator of server power consumption. (4) The reasonable use of the NUMA architecture will improve the memory energy efficiency significantly. The experimental results show that reasonable use of NUMA architecture can improve memory efficiency by 16% compared with SMP architecture, while unreasonable use of NUMA architecture reduces memory efficiency by 13%. The findings we present in this paper provide useful insights and guidance for system designers and data center operators to help them in energy-efficiency-aware job scheduling and energy conservation.


2021 ◽  
Vol 11 (2) ◽  
pp. 796
Author(s):  
Alhanoof Althnian ◽  
Duaa AlSaeed ◽  
Heyam Al-Baity ◽  
Amani Samha ◽  
Alanoud Bin Dris ◽  
...  

Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naïve Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.


2015 ◽  
Author(s):  
Lisa M. Breckels ◽  
Sean Holden ◽  
David Wojnar ◽  
Claire M. Mulvey ◽  
Andy Christoforou ◽  
...  

AbstractSub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.AbbreviationsLOPITLocalisation of Organelle Proteins by Isotope TaggingPCPProtein Correlation ProfilingMLMachine learningTLTransfer learningSVMSupport vector machinePCAPrincipal component analysisGOGene OntologyCCCellular compartmentiTRAQIsobaric tags for relative and absolute quantitationTMTTandem mass tagsMSMass spectrometry


2019 ◽  
Vol 8 (3) ◽  
pp. 4265-4271

Software testing is an essential activity in software industries for quality assurance; subsequently, it can be effectively removing defects before software deployment. Mostly good software testing strategy is to accomplish the fundamental testing objective while solving the trade-offs between effectiveness and efficiency testing issues. Adaptive and Random Partition software Testing (ARPT) approach was a combination of Adaptive Testing (AT) and Random Partition Approach (RPT) used to test software effectively. It has two variants they are ARPT-1 and ARPT-2. In ARPT-1, AT was used to select a certain number of test cases and then RPT was used to select a number of test cases before returning to AT. In ARPT-2, AT was used to select the first m test cases and then switch to RPT for the remaining tests. The computational complexity for random partitioning in ARPT was solved by cluster the test cases using a different clustering algorithm. The parameters of ARPT-1 and ARPT-2 needs to be estimated for different software, it leads to high computation overhead and time consumption. It was solved by Improvised BAT optimization algorithms and this approach is named as Optimized ARPT1 (OARPT1) and OARPT2. By using all test cases in OARPT will leads to high time consumption and computational overhead. In order to avoid this problem, OARPT1 with Support Vector Machine (OARPT1-SVM) and OARPT2- SVM are introduced in this paper. The SVM is used for selection of best test cases for OARPT-1 and OARPT-2 testing strategy. The SVM constructs hyper plane in a multi-dimensional space which is used to separate test cases which have high code and branch coverage and test cases which have low code and branch coverage. Thus, the SVM selects the best test cases for OARPT-1 and OARPT-2. The selected test cases are used in OARPT-1 and OARPT-2 to test software. In the experiment, three different software is used to prove the effectiveness of proposed OARPT1- SVM and OARPT2-SVM testing strategies in terms of time consumption, defect detection efficiency, branch coverage and code coverage.


2021 ◽  
Vol 13 (20) ◽  
pp. 11554
Author(s):  
Fahad Haneef ◽  
Giovanni Pernigotto ◽  
Andrea Gasparella ◽  
Jérôme Henri Kämpf

Nearly-zero energy buildings are now a standard for new constructions. However, the real challenge for a decarbonized society relies in the renovation of the existing building stock, selecting energy efficiency measures considering not only the energy performance but also the economic and sustainability ones. Even if the literature is full of examples coupling building energy simulation with multi-objective optimization for the identification of the best measures, the adoption of such approaches is still limited for district and urban scale simulation, often because of lack of complete data inputs and high computational requirements. In this research, a new methodology is proposed, combining the detailed geometric characterization of urban simulation tools with the simplification provided by “building archetype” modeling, in order to ensure the development of robust models for the multi-objective optimization of retrofit interventions at district scale. Using CitySim as an urban scale energy modeling tool, a residential district built in the 1990s in Bolzano, Italy, was studied. Different sets of renovation measures for the building envelope and three objectives —i.e., energy, economic and sustainability performances, were compared. Despite energy savings from 29 to 46%, energy efficiency measures applied just to the building envelope were found insufficient to meet the carbon neutrality goals without interventions to the system, in particular considering mechanical ventilation with heat recovery. Furthermore, public subsidization has been revealed to be necessary, since none of the proposed measures is able to pay back the initial investment for this case study.


Sign in / Sign up

Export Citation Format

Share Document