The usage of RFID technology using Big Data in logistics processess

Zbigniew Łukasik; Bartłomiej Ulatowski; Łukasz Łukasik

doi:10.24136/atest.2018.497

The usage of RFID technology using Big Data in logistics processess

AUTOBUSY – Technika Eksploatacja Systemy Transportowe ◽

10.24136/atest.2018.497 ◽

2018 ◽

Vol 19 (12) ◽

pp. 780-782

Author(s):

Zbigniew Łukasik ◽

Bartłomiej Ulatowski ◽

Łukasz Łukasik

Keyword(s):

Big Data ◽

Data Warehouse ◽

Radio Frequency Identification ◽

Information Quality ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Rfid Technology ◽

Data Set ◽

Frequency Identification

The following article shows how to use large data sets using logistic data using radio frequency identification (RFID) technology. First of all, the so-called RFID-Cuboids processors are introduced to create a data warehouse so that logistics data managed via RFID technology can be highly integrated in terms of specific logic and operations. Second, the tables are used to combine r data to increase information quality and reduce the data set volume.

Download Full-text

Machine Learning (ML) for Tracking Fashion Trends: Documenting the Frequency of the Baseball Cap on Social Media and the Runway

Clothing and Textiles Research Journal ◽

10.1177/0887302x20931195 ◽

2020 ◽

pp. 0887302X2093119 ◽

Cited By ~ 1

Author(s):

Rachel Rose Getman ◽

Denise Nicole Green ◽

Kavita Bala ◽

Utkarsh Mall ◽

Nehal Rawat ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Fashion Studies ◽

Computer Scientists ◽

High Level ◽

Cultural Shifts

With the proliferation of digital photographs and the increasing digitization of historical imagery, fashion studies scholars must consider new methods for interpreting large data sets. Computational methods to analyze visual forms of big data have been underway in the field of computer science through computer vision, where computers are trained to “read” images through a process called machine learning. In this study, fashion historians and computer scientists collaborated to explore the practical potential of this emergent method by examining a trend related to one particular fashion item—the baseball cap—across two big data sets—the Vogue Runway database (2000–2018) and the Matzen et al. Streetstyle-27K data set (2013–2016). We illustrate one implementation of high-level concept recognition to map a fashion trend. Tracking trend frequency helps visualize larger patterns and cultural shifts while creating sociohistorical records of aesthetics, which benefits fashion scholars and industry alike.

Download Full-text

Generation of geometric interpolations of building types with deep variational autoencoders

Design Science ◽

10.1017/dsj.2020.31 ◽

2020 ◽

Vol 6 ◽

Author(s):

Jaime de Miguel Rodríguez ◽

Maria Eugenia Villafañe ◽

Luka Piškorec ◽

Fernando Sancho Caparrini

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Large Data ◽

Learning Model ◽

Large Data Sets ◽

Data Sets ◽

Connectivity Map ◽

Data Set ◽

3D Objects ◽

Machine Learning Model

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.

Download Full-text

Effects of genotype and lactation number on health and reproductive problems in dairy cows

Proceedings of the British Society of Animal Science ◽

10.1017/s1752756200595842 ◽

1997 ◽

Vol 1997 ◽

pp. 143-143

Author(s):

B.L. Nielsen ◽

R.F. Veerkamp ◽

J.E. Pryce ◽

G. Simm ◽

J.D. Oldham

Keyword(s):

Dairy Cows ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Variation Analysis ◽

Genetic Line ◽

Data Set ◽

Health Events ◽

Use Of Data ◽

Low Incidence

High producing dairy cows have been found to be more susceptible to disease (Jones et al., 1994; Göhn et al., 1995) raising concerns about the welfare of the modern dairy cow. Genotype and number of lactations may affect various health problems differently, and their relative importance may vary. The categorical nature and low incidence of health events necessitates large data-sets, but the use of data collected across herds may introduce unwanted variation. Analysis of a comprehensive data-set from a single herd was carried out to investigate the effects of genetic line and lactation number on the incidence of various health and reproductive problems.

Download Full-text

Extreme Learning Machine with sigmoid activation function on large data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1433.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3523-3526

Keyword(s):

Efficient Algorithm ◽

Large Data ◽

Activation Function ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Learning Machine ◽

Sigmoid Activation Function ◽

State Of Art ◽

Better Than

This paper describes an efficient algorithm for classification in large data set. While many algorithms exist for classification, they are not suitable for larger contents and different data sets. For working with large data sets various ELM algorithms are available in literature. However the existing algorithms using fixed activation function and it may lead deficiency in working with large data. In this paper, we proposed novel ELM comply with sigmoid activation function. The experimental evaluations demonstrate the our ELM-S algorithm is performing better than ELM,SVM and other state of art algorithms on large data sets.

Download Full-text

Sensing Big Data: Multimodal Information Interfaces for Exploration of Large Data Sets

Big Data at Work ◽

10.4324/9781315780504-12 ◽

2015 ◽

pp. 172-192

Keyword(s):

Big Data ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Multimodal Information

Download Full-text

A Detailed Study on Classification Algorithms in Big Data

Big Data Analytics for Sustainable Computing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9750-6.ch002 ◽

2020 ◽

pp. 30-46

Author(s):

Saranya N. ◽

Saravana Selvam

Keyword(s):

Big Data ◽

Random Forest ◽

Linear Regression ◽

Comprehensive Evaluation ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Classification Methods ◽

Computing Science ◽

Data Collections

After an era of managing data collection difficulties, these days the issue has turned into the problem of how to process these vast amounts of information. Scientists, as well as researchers, think that today, probably the most essential topic in computing science is Big Data. Big Data is used to clarify the huge volume of data that could exist in any structure. This makes it difficult for standard controlling approaches for mining the best possible data through such large data sets. Classification in Big Data is a procedure of summing up data sets dependent on various examples. There are distinctive classification frameworks which help us to classify data collections. A few methods that discussed in the chapter are Multi-Layer Perception Linear Regression, C4.5, CART, J48, SVM, ID3, Random Forest, and KNN. The target of this chapter is to provide a comprehensive evaluation of classification methods that are in effect commonly utilized.

Download Full-text

Uncertainty-Based Clustering Algorithms for Large Data Sets

Modern Technologies for Big Data Classification and Clustering - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2805-0.ch001 ◽

2018 ◽

pp. 1-33 ◽

Cited By ~ 1

Author(s):

B. K. Tripathy ◽

Hari Seetha ◽

M. N. Murty

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Mining Machine ◽

Data Sets ◽

Fuzzy C Means ◽

Intuitionistic Fuzzy ◽

New Algorithms

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.

Download Full-text

Electronic Records Management - An Old Solution to a New Problem

Big Data ◽

10.4018/978-1-4666-9840-6.ch102 ◽

2016 ◽

pp. 2249-2274

Author(s):

Chinh Nguyen ◽

Rosemary Stockdale ◽

Helana Scheepers ◽

Jason Sargent

Keyword(s):

Big Data ◽

Rapid Development ◽

Large Data ◽

Large Data Sets ◽

Electronic Records ◽

Future Research ◽

Records Management ◽

Data Sets ◽

Interactive Nature ◽

Electronic Records Management

The rapid development of technology and interactive nature of Government 2.0 (Gov 2.0) is generating large data sets for Government, resulting in a struggle to control, manage, and extract the right information. Therefore, research into these large data sets (termed Big Data) has become necessary. Governments are now spending significant finances on storing and processing vast amounts of information because of the huge proliferation and complexity of Big Data and a lack of effective records management. On the other hand, there is a method called Electronic Records Management (ERM), for controlling and governing the important data of an organisation. This paper investigates the challenges identified from reviewing the literature for Gov 2.0, Big Data, and ERM in order to develop a better understanding of the application of ERM to Big Data to extract useable information in the context of Gov 2.0. The paper suggests that a key building block in providing useable information to stakeholders could potentially be ERM with its well established governance policies. A framework is constructed to illustrate how ERM can play a role in the context of Gov 2.0. Future research is necessary to address the specific constraints and expectations placed on governments in terms of data retention and use.

Download Full-text

Scalable Non-Parametric Methods for Large Data Sets

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch260 ◽

2011 ◽

pp. 1708-1713

Author(s):

V. Suresh Babu ◽

P. Viswanath ◽

Narasimha M. Murty

Keyword(s):

Nearest Neighbor ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Parametric Methods ◽

Clustering Method ◽

Data Set ◽

Computational Burden ◽

Set Size ◽

Non Parametric

Non-parametric methods like the nearest neighbor classifier (NNC) and the Parzen-Window based density estimation (Duda, Hart & Stork, 2000) are more general than parametric methods because they do not make any assumptions regarding the probability distribution form. Further, they show good performance in practice with large data sets. These methods, either explicitly or implicitly estimates the probability density at a given point in a feature space by counting the number of points that fall in a small region around the given point. Popular classifiers which use this approach are the NNC and its variants like the k-nearest neighbor classifier (k-NNC) (Duda, Hart & Stock, 2000). Whereas the DBSCAN is a popular density based clustering method (Han & Kamber, 2001) which uses this approach. These methods show good performance, especially with larger data sets. Asymptotic error rate of NNC is less than twice the Bayes error (Cover & Hart, 1967) and DBSCAN can find arbitrary shaped clusters along with noisy outlier detection (Ester, Kriegel & Xu, 1996). The most prominent difficulty in applying the non-parametric methods for large data sets is its computational burden. The space and classification time complexities of NNC and k-NNC are O(n) where n is the training set size. The time complexity of DBSCAN is O(n2). So, these methods are not scalable for large data sets. Some of the remedies to reduce this burden are as follows. (1) Reduce the training set size by some editing techniques in order to eliminate some of the training patterns which are redundant in some sense (Dasarathy, 1991). For example, the condensed NNC (Hart, 1968) is of this type. (2) Use only a few selected prototypes from the data set. For example, Leaders-subleaders method and l-DBSCAN method are of this type (Vijaya, Murthy & Subramanian, 2004 and Viswanath & Rajwala, 2006). These two remedies can reduce the computational burden, but this can also result in a poor performance of the method. Using enriched prototypes can improve the performance as done in (Asharaf & Murthy, 2003) where the prototypes are derived using adaptive rough fuzzy set theory and as in (Suresh Babu & Viswanath, 2007) where the prototypes are used along with their relative weights. Using a few selected prototypes can reduce the computational burden. Prototypes can be derived by employing a clustering method like the leaders method (Spath, 1980), the k-means method (Jain, Dubes, & Chen, 1987), etc., which can find a partition of the data set where each block (cluster) of the partition is represented by a prototype called leader, centroid, etc. But these prototypes can not be used to estimate the probability density, since the density information present in the data set is lost while deriving the prototypes. The chapter proposes to use a modified leader clustering method called the counted-leader method which along with deriving the leaders preserves the crucial density information in the form of a count which can be used in estimating the densities. The chapter presents a fast and efficient nearest prototype based classifier called the counted k-nearest leader classifier (ck-NLC) which is on-par with the conventional k-NNC, but is considerably faster than the k-NNC. The chapter also presents a density based clustering method called l-DBSCAN which is shown to be a faster and scalable version of DBSCAN (Viswanath & Rajwala, 2006). Formally, under some assumptions, it is shown that the number of leaders is upper-bounded by a constant which is independent of the data set size and the distribution from which the data set is drawn.

Download Full-text

Applying the K-Means Algorithm in Big Raw Data Sets with Hadoop and MapReduce

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch062 ◽

2016 ◽

pp. 1220-1243

Author(s):

Ilias K. Savvas ◽

Georgia N. Sofianidou ◽

M-Tahar Kechadi

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

File System ◽

Large Data ◽

Large Data Sets ◽

Distributed File System ◽

Data Sets ◽

Raw Data ◽

Hadoop Distributed File System ◽

Access To Data

Big data refers to data sets whose size is beyond the capabilities of most current hardware and software technologies. The Apache Hadoop software library is a framework for distributed processing of large data sets, while HDFS is a distributed file system that provides high-throughput access to data-driven applications, and MapReduce is software framework for distributed computing of large data sets. Huge collections of raw data require fast and accurate mining processes in order to extract useful knowledge. One of the most popular techniques of data mining is the K-means clustering algorithm. In this study, the authors develop a distributed version of the K-means algorithm using the MapReduce framework on the Hadoop Distributed File System. The theoretical and experimental results of the technique prove its efficiency; thus, HDFS and MapReduce can apply to big data with very promising results.

Download Full-text