scholarly journals SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary

2018 ◽  
Vol 61 ◽  
pp. 863-905 ◽  
Author(s):  
Alberto Fernandez ◽  
Salvador Garcia ◽  
Francisco Herrera ◽  
Nitesh V. Chawla

The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to different type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several different domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also significantly contributed to new supervised learning paradigms, including multilabel classification, incremental learning, semi-supervised learning, multi-instance learning, among others. It is standard benchmark for learning from imbalanced data. It is also featured in a number of different software packages - from open source to commercial. In this paper, marking the fifteen year anniversary of SMOTE, we reflect on the SMOTE journey, discuss the current state of affairs with SMOTE, its applications, and also identify the next set of challenges to extend SMOTE for Big Data problems.

2021 ◽  
Author(s):  
Shujuan Wang ◽  
Yuntao Dai ◽  
Jihong Shen ◽  
Jingxue Xuan

Abstract With the development of artificial intelligence, the research of medical auxiliary diagnosis based on big data classification is considered as a new technology that can be expected. Due to the different condition in the collection of different samples, medical big data often has imbalances. The class imbalance problems have been reported to severely hinder classification performance of many standard learning algorithms, and have attracted a great deal of attention from researchers of different fields. Focusing on this problem, an improved SMOTE algorithm based on Normal distribution is proposed in this paper. The principle of Normal random distribution is introduced to expand the minority sample, so that the new sample points are distributed closer to the center of the minority sample with a higher probability. In addition, the distribution of the generated data is controlled based on the characteristics of the Normal distribution. And the influence of the statistical characteristics of the original data on the parameter(variance) selection is analyzed based on the inter-class distance and sample variance. Experiments show that the proposed algorithm has better classification effect on the Pima, WDBC, WPBC, Ionosphere and Breast-cancer-wisconsin imbalanced datasets than the original SMOTE algorithm according to AUC, OOB, F-value, G-value.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Shujuan Wang ◽  
Yuntao Dai ◽  
Jihong Shen ◽  
Jingxue Xuan

AbstractWith the development of artificial intelligence, big data classification technology provides the advantageous help for the medicine auxiliary diagnosis research. While due to the different conditions in the different sample collection, the medical big data is often imbalanced. The class-imbalance problem has been reported as a serious obstacle to the classification performance of many standard learning algorithms. SMOTE algorithm could be used to generate sample points randomly to improve imbalance rate, but its application is affected by the marginalization generation and blindness of parameter selection. Focusing on this problem, an improved SMOTE algorithm based on Normal distribution is proposed in this paper, so that the new sample points are distributed closer to the center of the minority sample with a higher probability to avoid the marginalization of the expanded data. Experiments show that the classification effect is better when use proposed algorithm to expand the imbalanced dataset of Pima, WDBC, WPBC, Ionosphere and Breast-cancer-wisconsin than the original SMOTE algorithm. In addition, the parameter selection of the proposed algorithm is analyzed and it is found that the classification effect is the best when the distribution characteristics of the original data was maintained best by selecting appropriate parameters in our designed experiments.


Chelovek RU ◽  
2020 ◽  
pp. 217-220
Author(s):  
Natalia Rostova ◽  

The article analyzes the current state of affairs in philosophy in relation to the question «What is hu-man?». In this regard, the author identifies two strategies – post-humanism and post-cosmism. The strat-egy of post-humanism is to deny the idea of human exceptionalism. Humanity becomes something that can be thought of out of touch with human and understood as a right that extends to the non-human world. Post-cosmism, on the contrary, advocated the idea of ontological otherness of the human. Re-sponding to the challenges of anthropological catastrophe, its representatives propose a number of new anthropological projects.


MedienJournal ◽  
2017 ◽  
Vol 38 (4) ◽  
pp. 50-61 ◽  
Author(s):  
Jan Jagodzinski

This paper will first briefly map out the shift from disciplinary to control societies (what I call designer capitalism, the idea of control comes from Gilles Deleuze) in relation to surveillance and mediation of life through screen cultures. The paper then shifts to the issues of digitalization in relation to big data that have the danger of continuing to close off life as zoë, that is life that is creative rather than captured via attention technologies through marketing techniques and surveillance. The last part of this paper then develops the way artists are able to resist the big data archive by turning the data in on itself to offer viewers and participants a glimpse of the current state of manipulating desire and maintaining copy right in order to keep the future closed rather than being potentially open.


2010 ◽  
Vol 27 (4) ◽  
pp. 45-67
Author(s):  
Sayed Sikandar Shah ◽  
Mek Wok Mahmud

As an intellectual process, critical thinking plays a dynamic role in reconstructing human thought. In Islamic legal thought, this intellectual tool was pivotal in building a full-fledged jurisprudential system during the golden age of Islamic civilization. With the solidification of the science of Islamic legal theory and the entrenchment of classical Islamic jurisprudence, this process abated somewhat. Recent Islamic revival movements have engendered a great zeal for reinstituting this process. The current state of affairs in constructing and reconstructing Islamic jurisprudence by and large do not, however, reflect the dynamic feature of intellectual thought in this particular discipline. Thus this article attempts to briefly delineate this concept, unveil the reality on the ground, and identify some hands-on strategies for applying critical thinking in contemporary ijtihad.


Author(s):  
Farhan Zahid

Pakistan remains a country of vital importance for Al-Qaeda. It is primarily because of Al-Qaeda’s advent, rise and shelter and not to mention the support the terrorist organization found at the landscape of Pakistan during the last two decades. The emergence of in Pakistan can be traced back to the Afghan War (1979-89), with a brief sabbatical in Sudan the Islamist terrorist group rose to gain prominence after shifting back to Afghanistan. It then became a global ‘Islamist’ terrorist entity while based in neighboring Afghanistan and found safe havens in the erstwhile tribal areas of Pakistan in the aftermath of the US invasion of Afghanistan in 2001. Prior to its formation in 1988 in Peshawar (Pakistan), it had worked as Maktab al-Khidmat (Services Bureau) during the Afghan War.2 It had its roots in Pakistan, which had become a transit point of extremists en route to Afghanistan during the War. All high profile Al-Qaeda leaders, later becoming high-value targets, and members of its central Shura had lived in Pakistan at one point in their lives. That is the very reason the Al-Qaeda in Pakistan is termed as Al-Qaeda Core or Central among law enforcement practitioners and intelligence communities. Without going into details of Al-Qaeda’s past in Pakistan the aim of this article is to focus on its current state of affairs and what future lies ahead of it in Pakistan.


2020 ◽  
Vol 22 (5) ◽  
pp. 51-55
Author(s):  
OLEG N. KORCHAGIN ◽  
◽  
ANASTASIA V. LYADSKAYA ◽  

The article is devoted to the current state of digitalization aimed at solving urgent problems of combating corruption in the field of public administration and private business sector. The work considers the experience of foreign countries and the influence of digital technologies on the fight against corruption. It is noted that the digitalization of public administration is becoming one of the decisive factors for increasing the efficiency of the anti-corruption system and improving management mechanisms. Big Data, if integrated and structured according to the given parameters, allows the implementation of legislative, law enforcement, control and supervisory and law enforcement activities reliably and transparently. Big Data tools allow us to analyze processes, identify dependencies and predict corruption risks. The author describes the most significant problems that complicate the transfer of offline technologies into the online environment. The paper analyzes promising directions for the development of digital technologies that would lead to solving the arising problems, as well as to implement tasks that previously seemed unreachable. The article also describes current developments in the field of collecting and managing large amounts of data, the “Internet of Things”, modern network architecture, and other advances in the field of IT; the work provides applied examples of their potential use in the field of combating corruption. The study gives reasons that, in the context of combating corruption, digitalization should be allocated in a separate area of activity that is controlled and regulated by the state.


2021 ◽  
Vol 13 (2) ◽  
pp. 703
Author(s):  
Megan Drewniak ◽  
Dimitrios Dalaklis ◽  
Anastasia Christodoulou ◽  
Rebecca Sheehan

In recent years, a continuous decline of ice-coverage in the Arctic has been recorded, but these high latitudes are still dominated by earth’s polar ice cap. Therefore, safe and sustainable shipping operations in this still frozen region have as a precondition the availability of ice-breaking support. The analysis in hand provides an assessment of the United States’ and Canada’s polar ice-breaking program with the purpose of examining to what extent these countries’ relevant resources are able to meet the facilitated growth of industrial interests in the High North. This assessment will specifically focus on the maritime transportation sector along the Northwest Passage and consists of four main sections. The first provides a very brief description of the main Arctic passages. The second section specifically explores the current situation of the Northwest Passage, including the relevant navigational challenges, lack of infrastructure, available routes that may be used for transit, potential choke points, and current state of vessel activity along these routes. The third one examines the economic viability of the Northwest Passage compared to that of the Panama Canal; the fourth and final section is investigating the current and future capabilities of the United States’ and Canada’s ice-breaking fleet. Unfortunately, both countries were found to be lacking the necessary assets with ice-breaking capabilities and will need to accelerate their efforts in order to effectively respond to the growing needs of the Arctic. The total number of available ice-breaking assets is impacting negatively the level of support by the marine transportation system of both the United States and Canada; these two countries are facing the possibility to be unable to effectively meet the expected future needs because of the lengthy acquisition and production process required for new ice-breaking fleets.


Sign in / Sign up

Export Citation Format

Share Document