scholarly journals A Quantification Method for Supraharmonic Emissions Based on Outlier Detection Algorithms

Energies ◽  
2021 ◽  
Vol 14 (19) ◽  
pp. 6404
Author(s):  
Hui Zhou ◽  
Zesen Gui ◽  
Jiang Zhang ◽  
Qun Zhou ◽  
Xueshan Liu ◽  
...  

Based on outlier detection algorithms, a feasible quantification method for supraharmonic emission signals is presented. It is designed to tackle the requirements of high-resolution and low data volume simultaneously in the frequency domain. The proposed method was developed from the skewed distribution data model and the self-tuning parameters of density-based spatial clustering of applications with noise (DBSCAN) algorithm. Specifically, the data distribution of the supraharmonic band was analyzed first by the Jarque–Bera test. The threshold was determined based on the distribution model to filter out noise. Subsequently, the DBSCAN clustering algorithm parameters were adjusted automatically, according to the k-dist curve slope variation and the dichotomy parameter seeking algorithm, followed by the clustering. The supraharmonic emission points were analyzed as outliers. Finally, simulated and experimental data were applied to verify the effectiveness of the proposed method. On the basis of the detection results, a spectrum with the same resolution as the original spectrum was obtained. The amount of data declined by more than three orders of magnitude compared to the original spectrum. The presented method will benefit the analysis of quantification for the amplitude and frequency of supraharmonic emissions.

2016 ◽  
Vol 25 (3) ◽  
pp. 431-440 ◽  
Author(s):  
Archana Purwar ◽  
Sandeep Kumar Singh

AbstractThe quality of data is an important task in the data mining. The validity of mining algorithms is reduced if data is not of good quality. The quality of data can be assessed in terms of missing values (MV) as well as noise present in the data set. Various imputation techniques have been studied in MV study, but little attention has been given on noise in earlier work. Moreover, to the best of knowledge, no one has used density-based spatial clustering of applications with noise (DBSCAN) clustering for MV imputation. This paper proposes a novel technique density-based imputation (DBSCANI) built on density-based clustering to deal with incomplete values in the presence of noise. Density-based clustering algorithm proposed by Kriegal groups the objects according to their density in spatial data bases. The high-density regions are known as clusters, and the low-density regions refer to the noise objects in the data set. A lot of experiments have been performed on the Iris data set from life science domain and Jain’s (2D) data set from shape data sets. The performance of the proposed method is evaluated using root mean square error (RMSE) as well as it is compared with existing K-means imputation (KMI). Results show that our method is more noise resistant than KMI on data sets used under study.


2017 ◽  
Vol 46 (5) ◽  
pp. 528001 ◽  
Author(s):  
周培培 Zhou Peipei ◽  
丁庆海 Ding Qinghai ◽  
罗海波 Luo Haibo ◽  
侯幸林 Hou Xinglin

2019 ◽  
Vol 37 (2) ◽  
pp. 225-239 ◽  
Author(s):  
Hongqi Han ◽  
Yongsheng Yu ◽  
Lijun Wang ◽  
Xiaorui Zhai ◽  
Yaxin Ran ◽  
...  

PurposeThe aim of this study is to present a novel approach based on semantic fingerprinting and a clustering algorithm called density-based spatial clustering of applications with noise (DBSCAN), which can be used to convert investor records into 128-bit semantic fingerprints. Inventor disambiguation is a method used to discover a unique set of underlying inventors and map a set of patents to their corresponding inventors. Resolving the ambiguities between inventors is necessary to improve the quality of the patent database and to ensure accurate entity-level analysis. Most existing methods are based on machine learning and, while they often show good performance, this comes at the cost of time, computational power and storage space.Design/methodology/approachUsing DBSCAN, the meta and textual data in inventor records are converted into 128-bit semantic fingerprints. However, rather than using a string comparison or cosine similarity to calculate the distance between pair-wise fingerprint records, a binary number comparison function was used in DBSCAN. DBSCAN then clusters the inventor records based on this distance to disambiguate inventor names.FindingsExperiments conducted on the PatentsView campaign database of the United States Patent and Trademark Office show that this method disambiguates inventor names with recall greater than 99 per cent in less time and with substantially smaller storage requirement.Research limitations/implicationsA better semantic fingerprint algorithm and a better distance function may improve precision. Setting of different clustering parameters for each block or other clustering algorithms will be considered to improve the accuracy of the disambiguation results even further.Originality/valueCompared with the existing methods, the proposed method does not rely on feature selection and complex feature comparison computation. Most importantly, running time and storage requirements are drastically reduced.


Author(s):  
Andrius Daranda ◽  
Gintautas Dzemyda

During the last years, marine traffic dramatically increases. Marine traffic safety highly depends on the mariner’s decisions and particular situations. The watch officer must continuously observe the marine traffic for anomalies because the anomaly detection is crucial to predict dangerous situations and to make a decision in time for safe marine navigation. In this paper, we present marine traffic anomaly detection by the combination of the DBSCAN clustering algorithm (Density- Based Spatial Clustering of Applications with Noise) with k-nearest neighbors analysis among the clusters and particular vessels. The clustering algorithm is applied to the historic marine traffic data – a set of vessel turn points. In our experiments, the total number of turn points was about 3 million, and about 160 megabytes of computer store was used. A formal numerical criterion to com-pare anomaly with normal traffic flow case has been proposed. It gives us a possibility to detect the vessels outside the typical traffic pattern. The proposed meth-od ensures the right decisions in different oceanic scale or hydro meteorology conditions in the detection of anomaly situation of the vessel.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Junhui Qiu ◽  
Qi Zhou ◽  
Weicai Ye ◽  
Qianjun Chen ◽  
Yun-Juan Bao

Abstract Background The gene-specific sweep is a selection process where an advantageous mutation along with the nearby neutral sites in a gene region increases the frequency in the population. It has been demonstrated to play important roles in ecological differentiation or phenotypic divergence in microbial populations. Therefore, identifying gene-specific sweeps in microorganisms will not only provide insights into the evolutionary mechanisms, but also unravel potential genetic markers associated with biological phenotypes. However, current methods were mainly developed for detecting selective sweeps in eukaryotic data of sparse genotypes and are not readily applicable to prokaryotic data. Furthermore, some challenges have not been sufficiently addressed by the methods, such as the low spatial resolution of sweep regions and lack of consideration of the spatial distribution of mutations. Results We proposed a novel gene-centric and spatial-aware approach for identifying gene-specific sweeps in prokaryotes and implemented it in a python tool SweepCluster. Our method searches for gene regions with a high level of spatial clustering of pre-selected polymorphisms in genotype datasets assuming a null distribution model of neutral selection. The pre-selection of polymorphisms is based on their genetic signatures, such as elevated population subdivision, excessive linkage disequilibrium, or significant phenotype association. Performance evaluation using simulation data showed that the sensitivity and specificity of the clustering algorithm in SweepCluster is above 90%. The application of SweepCluster in two real datasets from the bacteria Streptococcus pyogenes and Streptococcus suis showed that the impact of pre-selection was dramatic and significantly reduced the uninformative signals. We validated our method using the genotype data from Vibrio cyclitrophicus, the only available dataset of gene-specific sweeps in bacteria, and obtained a concordance rate of 78%. We noted that the concordance rate could be underestimated due to distinct reference genomes and clustering strategies. The application to the human genotype datasets showed that SweepCluster is also applicable to eukaryotic data and is able to recover 80% of a catalog of known sweep regions. Conclusion SweepCluster is applicable to a broad category of datasets. It will be valuable for detecting gene-specific sweeps in diverse genotypic data and provide novel insights on adaptive evolution.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Hui Liu ◽  
Yang Liu ◽  
Ran Zhang ◽  
Xia Wu

Nowadays, urban multimodal big data are freely available to the public due to the growing number of cities, which plays a critical role in many fields such as transportation, education, medical treatment, and land resource management. The successful completion of poverty-relief work can greatly improve the quality of people’s life and ensure the sustainable development of the society. Poverty is a severe challenge for human society. It is of great significance to apply machine learning to mine different categories of poverty-stricken households and further provide decision support for poverty alleviation. Traditional poverty alleviation methods need to consume a lot of manpower, material resources, and financial resources. Based on the density-based spatial clustering of applications with noise (DBSCAN), this paper designs the hierarchical DBSCAN clustering algorithm to identify and analyze the categories of poverty-stricken households in China. First, the proposed method adjusts the neighborhood radius dynamically for dividing the data space into several initial clusters with different densities. Then, neighbor clusters are identified by the border and inner distances constantly and aggregated recursively to form new clusters. Based on the idea of division and aggregation, the proposed method can recognize clusters of different forms and deal with noises effectively in the data space with imbalanced density distribution. The experiments indicate that the method has the ideal performance of clustering, which identifies the commonness and difference in characteristics of poverty-stricken households reasonably. In terms of the specific indicator “Accuracy,” the accuracy increases by 2.3% compared with other methods.


2021 ◽  
Author(s):  
Junhui Qiu ◽  
Qi Zhou ◽  
Weicai Ye ◽  
Qianjun Chen ◽  
Yun-Juan Bao

AbstractBackgroundThe gene-specific sweep is a selection process where an advantageous mutation along with the nearby neutral sites in a gene region increases the frequency in the population. It has been demonstrated to play important roles in ecological differentiation or phenotypic divergence in microbial populations. Therefore, identifying gene-specific sweeps in microorganisms will not only provide insights into the evolutionary mechanisms, but also unravel potential genetic markers associated with biological phenotypes. However, current methods were mainly developed for detecting selective sweeps in eukaryotic data of sparse genotypes and are not readily applicable to prokaryotic data. Furthermore, some challenges have not been sufficiently addressed by the methods, such as the low spatial resolution of sweep regions and lack of consideration of the spatial distribution of mutations.ResultsWe proposed a novel gene-centric and spatial-aware approach for identifying gene-specific sweeps in prokaryotes and implemented it in a python tool SweepCluster. Our method searches for gene regions with a high level of spatial clustering of pre-selected polymorphisms in genotype datasets assuming a null distribution model of neutral selection. The pre-selection of polymorphisms is based on their genetic signatures, such as elevated population subdivision, excessive linkage disequilibrium, or significant phenotype association. Performance evaluation using simulation data showed that the accuracy and sensitivity of the clustering algorithm in SweepCluster is above 90%. The application of SweepCluster in two real datasets from the bacteria Streptococcus pyogenes and Streptococcus suis showed that the impact of pre-selection was dramatic and significantly reduced the uninformative signals. We validated our method using the genotype data from Vibrio cyclitrophicus, the only available dataset of gene-specific sweeps in bacteria, and obtained a concordance rate of 78%. We noted that the concordance rate could be underestimated due to distinct reference genomes and clustering strategies. The application to the human genotype datasets showed that SweepCluster is also applicable to eukaryotic data and recovered the known sweep regions in a wide dynamic range of pre-selection parameters.ConclusionsSweepCluster is applicable to a broad category of datasets. It will be valuable for detecting gene-specific sweeps in diverse genotypic data and provide novel insights on adaptive evolution.


2020 ◽  
Vol 71 (3) ◽  
pp. 138-149
Author(s):  
Wael Farag

AbstractIn this paper, based on the fusion of Lidar and Radar measurement data, a real-time road-Object Detection and Tracking (LR_ODT) method for autonomous driving is proposed. The lidar and radar devices are installed on the ego car, and a customized Unscented Kalman Filter (UKF) is used for their data fusion. Lidars are accurate in determining objects’ positions but significantly less accurate on measuring their velocities. However, Radars are more accurate on measuring objects velocities but less accurate on determining their positions as they have a lower spatial resolution. Therefore, the merits of both sensors are combined using the proposed fusion approach to provide both pose and velocity data for objects moving in roads precisely. The Grid-Based Density-Based Spatial Clustering of Applications with Noise (GB-DBSCAN) clustering algorithm is used to detect objects and estimate their centroids from the lidar and radar raw data. Then, the estimation of the object’s velocity as well as determining its corresponding geometrical shape is performed by the RANdom SAmple Consensus (RANSAC) algorithm. The proposed technique is implemented using the high-performance language C++ and utilizes highly optimized math and optimization libraries for best real-time performance. The performance of the UKF fusion is compared to that of the Extended Kalman Filter fusion (EKF) showing its superiority. Simulation studies have been carried out to evaluate the performance of the LR ODT for tracking bicycles, cars, and pedestrians.


Author(s):  
Kawtar Sabor ◽  
Damien Jougnot ◽  
Roger Guerin ◽  
Barthélémy Steck ◽  
Jean-Marie Henault ◽  
...  

Summary Geophysical imaging using the inversion procedure is a powerful tool for the exploration of the Earth's subsurface. However, the interpretation of inverted images can sometimes be difficult, due to the inherent limitations of existing inversion algorithms, which produce smoothed sections. In order to improve and automate the processing and interpretation of inverted geophysical models, we propose an approach inspired from data mining. We selected an algorithm known as DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to perform clustering of inverted geophysical sections. The methodology relies on the automatic sorting and clustering of data. DBSCAN detects clusters in the inverted electrical resistivity values, with no prior knowledge of the number of clusters. This algorithm has the advantage of being defined by only two parameters: the neighbourhood of a point in the data space, and the minimum number of data points in this neighbourhood. We propose an objective procedure for the determination of these two parameters. The proof of concept described here is applied to simulated ERT (Electrical Resistivity Tomography) sections, for the following three cases: two layers with a step, two layers with a rebound, and two layers with an anomaly embedded in the upper layer. To validate this approach, sensitivity studies were carried out on both of the above parameters, as well as to assess the influence of noise on the algorithm's performance. Finally, this methodology was tested on real field data. DBSCAN detects clusters in the inverted electrical resistivity models, and the former are then associated with various types of earth materials, thus allowing the structure of the prospected area to be determined. The proposed data-mining algorithm is shown to be effective, and to improve the interpretation of the inverted ERT sections. This new approach has considerable potential, as it can be applied to any geophysical data represented in the form of sections or maps.


Sign in / Sign up

Export Citation Format

Share Document