scholarly journals A Clustering Algorithm via Density Perception and Hierarchical Aggregation Based on Urban Multimodal Big Data for Identifying and Analyzing Categories of Poverty-Stricken Households in China

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Hui Liu ◽  
Yang Liu ◽  
Ran Zhang ◽  
Xia Wu

Nowadays, urban multimodal big data are freely available to the public due to the growing number of cities, which plays a critical role in many fields such as transportation, education, medical treatment, and land resource management. The successful completion of poverty-relief work can greatly improve the quality of people’s life and ensure the sustainable development of the society. Poverty is a severe challenge for human society. It is of great significance to apply machine learning to mine different categories of poverty-stricken households and further provide decision support for poverty alleviation. Traditional poverty alleviation methods need to consume a lot of manpower, material resources, and financial resources. Based on the density-based spatial clustering of applications with noise (DBSCAN), this paper designs the hierarchical DBSCAN clustering algorithm to identify and analyze the categories of poverty-stricken households in China. First, the proposed method adjusts the neighborhood radius dynamically for dividing the data space into several initial clusters with different densities. Then, neighbor clusters are identified by the border and inner distances constantly and aggregated recursively to form new clusters. Based on the idea of division and aggregation, the proposed method can recognize clusters of different forms and deal with noises effectively in the data space with imbalanced density distribution. The experiments indicate that the method has the ideal performance of clustering, which identifies the commonness and difference in characteristics of poverty-stricken households reasonably. In terms of the specific indicator “Accuracy,” the accuracy increases by 2.3% compared with other methods.

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Hui Liu ◽  
Yang Liu ◽  
Zhenquan Qin ◽  
Ran Zhang ◽  
Zheng Zhang ◽  
...  

Big data technology has been developed rapidly in recent years. The performance improvement mechanism of targeted poverty alleviation is studied through the big data technology to further promote the comprehensive application of big data technology in poverty alleviation and development. Using the data mining knowledge to accurately identify the poor population under the framework of big data, compared with the traditional identification method, it is obviously more accurate and persuasive, which is also helpful to find out the real causes of poverty and assist the poor residents in the future. In the current targeted poverty alleviation work, the identification of poor households and the matching of assistance measures are mainly through the visiting of village cadres and the establishment of documents. Traditional methods are time-consuming, laborious, and difficult to manage. It always omits lots of useful family information. Therefore, new technologies need to be introduced to realize intelligent identification of poverty-stricken households and reduce labor costs. In this paper, we introduce a novel DBSCAN clustering algorithm via the edge computing-based deep neural network model for targeted poverty alleviation. First, we deploy an edge computing-based deep neural network model. Then, in this constructed model, we execute data mining for the poverty-stricken family. In this paper, the DBSCAN clustering algorithm is used to excavate the poverty features of the poor households and complete the intelligent identification of the poor households. In view of the current situation of high-dimensional and large-volume poverty alleviation data, the algorithm uses the relative density difference of grid to divide the data space into regions with different densities and adopts the DBSCAN algorithm to cluster the above result, which improves the accuracy of DBSCAN. This avoids the need for DBSCAN to traverse all data when searching for density connections. Finally, the proposed method is utilized for analyzing and mining the poverty alleviation data. The average accuracy is more than 96%. The average F -measure, NMI, and PRE values exceed 90%. The results show that it provides decision support for precise matching and intelligent pairing of village cadres in poverty alleviation work.


2022 ◽  
pp. 297-315
Author(s):  
Mohammad Tariq Intezar ◽  
Saad Bin Zia

Muslims are the largest minority in India, yet the federal government has, in place, just a single Muslim-specific poverty alleviation scheme, which is utterly insufficient to meet their financial needs. Hence, in the face of governmental apathy and indifferent attitude, Muslims are left to fend for themselves. In this scenario, Zakāt turns out to be a more-than-handy tool to alleviate poverty among Muslims in India. Zakāt, over the years, has manifested itself as a successful means to meet out the financial needs of the developmental activities across the level including the non-Muslim countries. Zakāt possesses a robust potential to play a critical role to implement the Sustainable Development Goals (SDGs) to alleviate the poverty of Muslims in a Hindu-majority country like India.


2016 ◽  
Vol 25 (3) ◽  
pp. 431-440 ◽  
Author(s):  
Archana Purwar ◽  
Sandeep Kumar Singh

AbstractThe quality of data is an important task in the data mining. The validity of mining algorithms is reduced if data is not of good quality. The quality of data can be assessed in terms of missing values (MV) as well as noise present in the data set. Various imputation techniques have been studied in MV study, but little attention has been given on noise in earlier work. Moreover, to the best of knowledge, no one has used density-based spatial clustering of applications with noise (DBSCAN) clustering for MV imputation. This paper proposes a novel technique density-based imputation (DBSCANI) built on density-based clustering to deal with incomplete values in the presence of noise. Density-based clustering algorithm proposed by Kriegal groups the objects according to their density in spatial data bases. The high-density regions are known as clusters, and the low-density regions refer to the noise objects in the data set. A lot of experiments have been performed on the Iris data set from life science domain and Jain’s (2D) data set from shape data sets. The performance of the proposed method is evaluated using root mean square error (RMSE) as well as it is compared with existing K-means imputation (KMI). Results show that our method is more noise resistant than KMI on data sets used under study.


Author(s):  
Mohammad Tariq Intezar ◽  
Saad Bin Zia

Muslims are the largest minority in India, yet the federal government has, in place, just a single Muslim-specific poverty alleviation scheme, which is utterly insufficient to meet their financial needs. Hence, in the face of governmental apathy and indifferent attitude, Muslims are left to fend for themselves. In this scenario, Zakāt turns out to be a more-than-handy tool to alleviate poverty among Muslims in India. Zakāt, over the years, has manifested itself as a successful means to meet out the financial needs of the developmental activities across the level including the non-Muslim countries. Zakāt possesses a robust potential to play a critical role to implement the Sustainable Development Goals (SDGs) to alleviate the poverty of Muslims in a Hindu-majority country like India.


2019 ◽  
Vol 37 (2) ◽  
pp. 225-239 ◽  
Author(s):  
Hongqi Han ◽  
Yongsheng Yu ◽  
Lijun Wang ◽  
Xiaorui Zhai ◽  
Yaxin Ran ◽  
...  

PurposeThe aim of this study is to present a novel approach based on semantic fingerprinting and a clustering algorithm called density-based spatial clustering of applications with noise (DBSCAN), which can be used to convert investor records into 128-bit semantic fingerprints. Inventor disambiguation is a method used to discover a unique set of underlying inventors and map a set of patents to their corresponding inventors. Resolving the ambiguities between inventors is necessary to improve the quality of the patent database and to ensure accurate entity-level analysis. Most existing methods are based on machine learning and, while they often show good performance, this comes at the cost of time, computational power and storage space.Design/methodology/approachUsing DBSCAN, the meta and textual data in inventor records are converted into 128-bit semantic fingerprints. However, rather than using a string comparison or cosine similarity to calculate the distance between pair-wise fingerprint records, a binary number comparison function was used in DBSCAN. DBSCAN then clusters the inventor records based on this distance to disambiguate inventor names.FindingsExperiments conducted on the PatentsView campaign database of the United States Patent and Trademark Office show that this method disambiguates inventor names with recall greater than 99 per cent in less time and with substantially smaller storage requirement.Research limitations/implicationsA better semantic fingerprint algorithm and a better distance function may improve precision. Setting of different clustering parameters for each block or other clustering algorithms will be considered to improve the accuracy of the disambiguation results even further.Originality/valueCompared with the existing methods, the proposed method does not rely on feature selection and complex feature comparison computation. Most importantly, running time and storage requirements are drastically reduced.


Author(s):  
Andrius Daranda ◽  
Gintautas Dzemyda

During the last years, marine traffic dramatically increases. Marine traffic safety highly depends on the mariner’s decisions and particular situations. The watch officer must continuously observe the marine traffic for anomalies because the anomaly detection is crucial to predict dangerous situations and to make a decision in time for safe marine navigation. In this paper, we present marine traffic anomaly detection by the combination of the DBSCAN clustering algorithm (Density- Based Spatial Clustering of Applications with Noise) with k-nearest neighbors analysis among the clusters and particular vessels. The clustering algorithm is applied to the historic marine traffic data – a set of vessel turn points. In our experiments, the total number of turn points was about 3 million, and about 160 megabytes of computer store was used. A formal numerical criterion to com-pare anomaly with normal traffic flow case has been proposed. It gives us a possibility to detect the vessels outside the typical traffic pattern. The proposed meth-od ensures the right decisions in different oceanic scale or hydro meteorology conditions in the detection of anomaly situation of the vessel.


Energies ◽  
2021 ◽  
Vol 14 (19) ◽  
pp. 6404
Author(s):  
Hui Zhou ◽  
Zesen Gui ◽  
Jiang Zhang ◽  
Qun Zhou ◽  
Xueshan Liu ◽  
...  

Based on outlier detection algorithms, a feasible quantification method for supraharmonic emission signals is presented. It is designed to tackle the requirements of high-resolution and low data volume simultaneously in the frequency domain. The proposed method was developed from the skewed distribution data model and the self-tuning parameters of density-based spatial clustering of applications with noise (DBSCAN) algorithm. Specifically, the data distribution of the supraharmonic band was analyzed first by the Jarque–Bera test. The threshold was determined based on the distribution model to filter out noise. Subsequently, the DBSCAN clustering algorithm parameters were adjusted automatically, according to the k-dist curve slope variation and the dichotomy parameter seeking algorithm, followed by the clustering. The supraharmonic emission points were analyzed as outliers. Finally, simulated and experimental data were applied to verify the effectiveness of the proposed method. On the basis of the detection results, a spectrum with the same resolution as the original spectrum was obtained. The amount of data declined by more than three orders of magnitude compared to the original spectrum. The presented method will benefit the analysis of quantification for the amplitude and frequency of supraharmonic emissions.


2020 ◽  
Vol 71 (3) ◽  
pp. 138-149
Author(s):  
Wael Farag

AbstractIn this paper, based on the fusion of Lidar and Radar measurement data, a real-time road-Object Detection and Tracking (LR_ODT) method for autonomous driving is proposed. The lidar and radar devices are installed on the ego car, and a customized Unscented Kalman Filter (UKF) is used for their data fusion. Lidars are accurate in determining objects’ positions but significantly less accurate on measuring their velocities. However, Radars are more accurate on measuring objects velocities but less accurate on determining their positions as they have a lower spatial resolution. Therefore, the merits of both sensors are combined using the proposed fusion approach to provide both pose and velocity data for objects moving in roads precisely. The Grid-Based Density-Based Spatial Clustering of Applications with Noise (GB-DBSCAN) clustering algorithm is used to detect objects and estimate their centroids from the lidar and radar raw data. Then, the estimation of the object’s velocity as well as determining its corresponding geometrical shape is performed by the RANdom SAmple Consensus (RANSAC) algorithm. The proposed technique is implemented using the high-performance language C++ and utilizes highly optimized math and optimization libraries for best real-time performance. The performance of the UKF fusion is compared to that of the Extended Kalman Filter fusion (EKF) showing its superiority. Simulation studies have been carried out to evaluate the performance of the LR ODT for tracking bicycles, cars, and pedestrians.


Author(s):  
Kawtar Sabor ◽  
Damien Jougnot ◽  
Roger Guerin ◽  
Barthélémy Steck ◽  
Jean-Marie Henault ◽  
...  

Summary Geophysical imaging using the inversion procedure is a powerful tool for the exploration of the Earth's subsurface. However, the interpretation of inverted images can sometimes be difficult, due to the inherent limitations of existing inversion algorithms, which produce smoothed sections. In order to improve and automate the processing and interpretation of inverted geophysical models, we propose an approach inspired from data mining. We selected an algorithm known as DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to perform clustering of inverted geophysical sections. The methodology relies on the automatic sorting and clustering of data. DBSCAN detects clusters in the inverted electrical resistivity values, with no prior knowledge of the number of clusters. This algorithm has the advantage of being defined by only two parameters: the neighbourhood of a point in the data space, and the minimum number of data points in this neighbourhood. We propose an objective procedure for the determination of these two parameters. The proof of concept described here is applied to simulated ERT (Electrical Resistivity Tomography) sections, for the following three cases: two layers with a step, two layers with a rebound, and two layers with an anomaly embedded in the upper layer. To validate this approach, sensitivity studies were carried out on both of the above parameters, as well as to assess the influence of noise on the algorithm's performance. Finally, this methodology was tested on real field data. DBSCAN detects clusters in the inverted electrical resistivity models, and the former are then associated with various types of earth materials, thus allowing the structure of the prospected area to be determined. The proposed data-mining algorithm is shown to be effective, and to improve the interpretation of the inverted ERT sections. This new approach has considerable potential, as it can be applied to any geophysical data represented in the form of sections or maps.


Author(s):  
Michael Goul ◽  
T. S. Raghu ◽  
Ziru Li

As procurement organizations increasingly move from a cost-and-efficiency emphasis to a profit-and-growth emphasis, flexible data architecture will become an integral part of a procurement analytics strategy. It is therefore imperative for procurement leaders to understand and address digitization trends in supply chains and to develop strategies to create robust data architecture and analytics strategies for the future. This chapter assesses and examines the ways companies can organize their procurement data architectures in the big data space to mitigate current limitations and to lay foundations for the discovery of new insights. It sets out to understand and define the levels of maturity in procurement organizations as they pertain to the capture, curation, exploitation, and management of procurement data. The chapter then develops a framework for articulating the value proposition of moving between maturity levels and examines what the future entails for companies with mature data architectures. In addition to surveying the practitioner and academic research literature on procurement data analytics, the chapter presents detailed and structured interviews with over fifteen procurement experts from companies around the globe. The chapter finds several important and useful strategies that have helped procurement organizations design strategic roadmaps for the development of robust data architectures. It then further identifies four archetype procurement area data architecture contexts. In addition, this chapter details exemplary high-level mature data architecture for each archetype and examines the critical assumptions underlying each one. Data architectures built for the future need a design approach that supports both descriptive and real-time, prescriptive analytics.


Sign in / Sign up

Export Citation Format

Share Document