Unsupervised Large‐Scale Search for Similar Earthquake Signals

Essentially, data mining concerns the computation of data and the identification of patterns and trends in the information so that we might decide or judge. Data mining concepts have been in use for years, but with the emergence of big data, they are even more common. In particular, the scalable mining of such large data sets is a difficult issue that has attached several recent findings. A few of these recent works use the MapReduce methodology to construct data mining models across the data set. In this article, we examine current approaches to large-scale data mining and compare their output to the MapReduce model. Based on our research, a system for data mining that combines MapReduce and sampling is implemented and addressed

Download Full-text

Construction of a century solar chromosphere data set for solar activity related research

Solar-Terrestrial Physics ◽

10.12737/stp-3220171 ◽

2017 ◽

Vol 3 (2) ◽

pp. 5-8

Author(s):

Линь Ганхуа ◽

Lin Ganghua ◽

Ван Сяо-Фань ◽

Wang Xiao Fan ◽

Ян Сяо ◽

...

Keyword(s):

Data Mining ◽

Solar Activity ◽

Space Weather ◽

Large Scale ◽

Abnormal Behavior ◽

Solar Chromosphere ◽

Solar Cycles ◽

Data Set ◽

Related Research ◽

Data Mining Algorithms

This article introduces our ongoing project “Construction of a Century Solar Chromosphere Data Set for Solar Activity Related Research”. Solar activities are the major sources of space weather that affects human lives. Some of the serious space weather consequences, for instance, include interruption of space communication and navigation, compromising the safety of astronauts and satellites, and damaging power grids. Therefore, the solar activity research has both scientific and social impacts. The major database is built up from digitized and standardized film data obtained by several observatories around the world and covers a timespan more than 100 years. After careful calibration, we will develop feature extraction and data mining tools and provide them together with the comprehensive database for the astronomical community. Our final goal is to address several physical issues: filament behavior in solar cycles, abnormal behavior of solar cycle 24, large-scale solar eruptions, and sympathetic remote brightenings. Significant progresses are expected in data mining algorithms and software development, which will benefit the scientific analysis and eventually advance our understanding of solar cycles.

Download Full-text

Workload Optimization by Horizontal Aggregation in SQL for Data Mining Analysis

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217263 ◽

2021 ◽

pp. 304-309

Author(s):

Prasanna M. Rathod ◽

Prof. Dr. Anjali B. Raut

Keyword(s):

Data Mining ◽

Relational Algebra ◽

Data Migration ◽

Data Sets ◽

Data Set ◽

Application Performance ◽

Data Mining Algorithms ◽

Mining Project ◽

Pivot Methods ◽

Mining Algorithms

Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets with a horizontal denormalized layout (e.g., point-dimension, observation variable, instance-feature), which is the standard layout required by most data mining algorithms. We propose three fundamental methods to evaluate horizontal aggregations: ? CASE: Exploiting the programming CASE construct; ? SPJ: Based on standard relational algebra operators (SPJ queries); ? PIVOT: Using the PIVOT operator, which is offered by some DBMSs. Experiments with large tables compare the proposed query evaluation methods. Our CASE method has similar speed to the PIVOT operator and it is much faster than the SPJ method. In general, the CASE and PIVOT methods exhibit linear scalability, whereas the SPJ method does not. For query optimization the distance computation and nearest cluster in the k-means are based on SQL. Workload balancing is the assignment of work to processors in a way that maximizes application performance. The process of load balancing can be generalized into four basic steps: 1. Monitoring processor load and state; 2. Exchanging workload and state information between processors; 3. Decision making; 4. Data migration. The decision phase is triggered when the load imbalance is detected to calculate optimal data redistribution. In the fourth and last phase, data migrates from overloaded processors to under-loaded ones.

Download Full-text

Comparative Analysis of Kohonen-SOM and K-Means data mining algorithms based on Academic Activities

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v6i1.4449 ◽

2013 ◽

Vol 6 (1) ◽

pp. 237-241

Author(s):

Shaina Dhingra ◽

Rimple Gilhotra ◽

Ravishanker Ravishanker

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

Large Scale ◽

Decision Makers ◽

Data Set ◽

Data Mining Algorithms ◽

Academic Activities ◽

Increasing Demand ◽

Kohonen Som ◽

Mining Algorithms

With the increasing demand of IT and subsequent growth in this sector, the high- dimensional data came into existence. Data Mining plays an important role in analyzing and extracting the useful information. The key information which is extracted from a huge pool of data is useful for decision makers. Clustering, one of the techniques of data mining is the mostly used methods of analyzing the data. In this paper, the approach of Kohonen SOM and K-Means and HAC are discussed. After that these three methods are used for analyzing the academic data set of the faculty members of particular university. Finally a comparative analysis of these algorithms are done against some parameters like number of clusters, error rate and accessing rate, etc. Â This work will present new and improved results from large-scale datasets.

Download Full-text

WEB APPLICATION FOR LARGE-SCALE MULTIDIMENSIONAL DATA VISUALIZATION

Mathematical Modelling and Analysis ◽

10.3846/13926292.2011.580381 ◽

2011 ◽

Vol 16 (1) ◽

pp. 273-285 ◽

Cited By ~ 4

Author(s):

Gintautas Dzemyda ◽

Virginijus Marcinkevičius ◽

Viktor Medvedev

Keyword(s):

Data Mining ◽

Data Visualization ◽

Web Application ◽

Large Scale ◽

Visual Presentation ◽

Multidimensional Data ◽

Data Sets ◽

Data Set ◽

Multidimensional Data Visualization ◽

Multidimensional Data Set

In this paper, we present an approach of the web application (as a service) for data mining oriented to the multidimensional data visualization. This paper focuses on visualization methods as a tool for the visual presentation of large-scale multidimensional data sets. The proposed implementation of such a web application obtains a multidimensional data set and as a result produces a visualization of this data set. It also supports different configuration parameters of the data mining methods used. Parallel computation has been used in the proposed implementation to run the algorithms simultaneously on different computers.

Download Full-text

Construction of a century solar chromosphere data set for solar activity related research

Solnechno-Zemnaya Fizika ◽

10.12737/22609 ◽

2017 ◽

Vol 3 (2) ◽

pp. 5-9

Author(s):

Линь Ганхуа ◽

Lin Ganghua ◽

Ван Сяо-Фань ◽

Wang Xiao Fan ◽

Ян Сяо ◽

...

Keyword(s):

Data Mining ◽

Solar Activity ◽

Space Weather ◽

Large Scale ◽

Abnormal Behavior ◽

Solar Chromosphere ◽

Solar Cycles ◽

Data Set ◽

Related Research ◽

Data Mining Algorithms

This article introduces our ongoing project “Construction of a Century Solar Chromosphere Data Set for Solar Activity Related Research”. Solar activities are the major sources of space weather that affects human lives. Some of the serious space weather consequences, for instance, include interruption of space communication and navigation, compromising the safety of astronauts and satellites, and damaging power grids. Therefore, the solar activity research has both scientific and social impacts. The major database is built up from digitized and standardized film data obtained by several observatories around the world and covers a timespan more than 100 years. After careful calibration, we will develop feature extraction and data mining tools and provide them together with the comprehensive database for the astronomical community. Our final goal is to address several physical issues: filament behavior in solar cycles, abnormal behavior of solar cycle 24, large-scale solar eruptions, and sympathetic remote brightenings. Significant progresses are expected in data mining algorithms and software development, which will benefit the scientific analysis and eventually advance our understanding of solar cycles.

Download Full-text

Comparison of the accuracy of classification algorithms on three data-sets in data mining: Example of 20 classes

International Journal of Engineering Science and Technology ◽

10.4314/ijest.v12i3.8 ◽

2020 ◽

Vol 12 (3) ◽

pp. 81-89

Author(s):

T. Sanlı ◽

Ç. Sıcakyüz ◽

O.H. Yüregir

Keyword(s):

Data Mining ◽

Text Mining ◽

Web Mining ◽

Industrial Engineering ◽

Classification Algorithms ◽

Data Sets ◽

Data Set ◽

Data Mining Algorithms ◽

Clustering And Classification ◽

Doctoral Dissertations

Data mining, which has different uses such as text mining and web mining, is especially used for clustering and classification purposes. In this study, this method was used for both classification and text mining. The aim of the study was the assessment of the performances of the data mining algorithms on the three datasets. A total of 6631 master's and doctoral dissertations written in the field of industrial engineering were downloaded from the Higher Education Council database. With the help of summary, subject titles and keywords of these dissertations, it was tried to be guessed which sub-field of industrial engineering it belongs to using WEKA program. As a result, it was observed that the data set containing the keywords obtained by weighting the expert opinion was more successful than the other two data sets. And the three most successful classification algorithms were found to be kNN, SMO, and J48, respectively. Keywords: Classification Algorithms, Data Mining, Multiple Classes, Dataset.

Download Full-text

Data Mining Algorithms: An Overview

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v15i6.1615 ◽

2016 ◽

Vol 15 (6) ◽

pp. 6806-6813 ◽

Cited By ~ 2

Author(s):

Sethunya R Joseph ◽

Hlomani Hlomani ◽

Keletso Letsholo

Keyword(s):

Data Mining ◽

Large Scale ◽

Predictive Analytics ◽

Large Data ◽

Knowledge Discovery In Databases ◽

Data Sets ◽

Data Mining Algorithms ◽

Data Extract ◽

Mining Algorithms ◽

Operational Processes

The research on data mining has successfully yielded numerous tools, algorithms, methods and approaches for handling large amounts of data for various purposeful use andÂ Â problem solving. Data mining has become an integral part of many application domains such as data ware housing, predictive analytics, business intelligence, bio-informatics and decision support systems. Prime objective of data mining is to effectively handle large scale data, extract actionable patterns, and gain insightful knowledge. Data mining is part and parcel of knowledge discovery in databases (KDD) process. Success and improved decision making normally depends on how quickly one can discover insights from data. These insights could be used to drive better actions which can be used in operational processes and even predict future behaviour. This paper presents an overview of various algorithms necessary for handling large data sets. These algorithms define various structures and methods implemented to handle big data. The review also discusses the general strengths and limitations of these algorithms. This paper can quickly guide or an eye opener to the data mining researchers on which algorithm(s) to select and apply in solving the problems they will be investigating.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

The Midlatitude Continental Convective Clouds Experiment (MC3E) sounding network: operations, processing and analysis

Atmospheric Measurement Techniques ◽

10.5194/amt-8-421-2015 ◽

2015 ◽

Vol 8 (1) ◽

pp. 421-434 ◽

Cited By ~ 18

Author(s):

M. P. Jensen ◽

T. Toto ◽

D. Troyan ◽

P. E. Ciesielski ◽

D. Holdridge ◽

...

Keyword(s):

Large Scale ◽

Scale Model ◽

Data Sets ◽

Central Plains ◽

Data Set ◽

Convective Systems ◽

Convective Clouds ◽

Quality Checks ◽

Network Operations ◽

The Impact

Abstract. The Midlatitude Continental Convective Clouds Experiment (MC3E) took place during the spring of 2011 centered in north-central Oklahoma, USA. The main goal of this field campaign was to capture the dynamical and microphysical characteristics of precipitating convective systems in the US Central Plains. A major component of the campaign was a six-site radiosonde array designed to capture the large-scale variability of the atmospheric state with the intent of deriving model forcing data sets. Over the course of the 46-day MC3E campaign, a total of 1362 radiosondes were launched from the enhanced sonde network. This manuscript provides details on the instrumentation used as part of the sounding array, the data processing activities including quality checks and humidity bias corrections and an analysis of the impacts of bias correction and algorithm assumptions on the determination of convective levels and indices. It is found that corrections for known radiosonde humidity biases and assumptions regarding the characteristics of the surface convective parcel result in significant differences in the derived values of convective levels and indices in many soundings. In addition, the impact of including the humidity corrections and quality controls on the thermodynamic profiles that are used in the derivation of a large-scale model forcing data set are investigated. The results show a significant impact on the derived large-scale vertical velocity field illustrating the importance of addressing these humidity biases.

Download Full-text