Constructing a Lightweight Key-Value Store Based on the Windows Native Features

Hyuk-Yoon Kwon

doi:10.3390/app9183801

Constructing a Lightweight Key-Value Store Based on the Windows Native Features

Applied Sciences ◽

10.3390/app9183801 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3801 ◽

Cited By ~ 1

Author(s):

Hyuk-Yoon Kwon

Keyword(s):

State Of The Art ◽

Main Idea ◽

Real Data ◽

Data Sets ◽

Parameter Setting ◽

Data Set ◽

Multi Level ◽

Windows Registry ◽

Best Parameter ◽

Better Than

In this paper, we propose a method to construct a lightweight key-value store based on the Windows native features. The main idea is providing a thin wrapper for the key-value store on top of a built-in storage in Windows, called Windows registry. First, we define a mapping of the components in the key-value store onto the components in the Windows registry. Then, we present a hash-based multi-level registry index so as to distribute the key-value data balanced and to efficiently access them. Third, we implement basic operations of the key-value store (i.e., Get, Put, and Delete) by manipulating the Windows registry using the Windows native APIs. We call the proposed key-value store WR-Store. Finally, we propose an efficient ETL (Extract-Transform-Load) method to migrate data stored in WR-Store into any other environments that support existing key-value stores. Because the performance of the Windows registry has not been studied much, we perform the empirical study to understand the characteristics of WR-Store, and then, tune the performance of WR-Store to find the best parameter setting. Through extensive experiments using synthetic and real data sets, we show that the performance of WR-Store is comparable to or even better than the state-of-the-art systems (i.e., RocksDB, BerkeleyDB, and LevelDB). Especially, we show the scalability of WR-Store. That is, WR-Store becomes much more efficient than the other key-value stores as the size of data set increases. In addition, we show that the performance of WR-Store is maintained even in the case of intensive registry workloads where 1000 processes accessing to the registry actively are concurrently running.

Download Full-text

A Microsoft Word Documents Carving Method Base on Interior Virtual Streams

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.3028 ◽

2012 ◽

Vol 433-440 ◽

pp. 3028-3032 ◽

Cited By ~ 1

Author(s):

Wei Lin ◽

Ming Xu

Keyword(s):

Word Processing ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Microsoft Word ◽

Sequential Hypothesis ◽

File Carving ◽

Sequential Hypothesis Testing ◽

And Control ◽

Better Than

Microsoft word is widely used for word processing and document production. So the loss of the Word file would be great damages for their owner. File carving can reconstruct files based on their content and their own structure rather than using the metadata of the file system. The existing methods do not work very well on carving fragmented Word files. This paper presents a Microsoft Word documents carving method based on interior virtual streams. Firstly, we locate the header section and control streams-SAT to construct the framework for a Word file; then find its fragment regions by utilizing the framework information, and use the sequential hypothesis testing on the data streams in the fragment region to detect the fragment point. Based on DFRWS data sets and real data set, experiments show our method can automatically carve continuous and fragmented Microsoft word file. Moreover, the comparative experiments demonstrate that the proposed method is better than others’ in accurateness and effectiveness.

Download Full-text

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Download Full-text

Multi-Instance Dimensionality Reduction via Sparsity and Orthogonality

Neural Computation ◽

10.1162/neco_a_01140 ◽

2018 ◽

Vol 30 (12) ◽

pp. 3281-3308

Author(s):

Hong Zhu ◽

Li-Zhi Liao ◽

Michael K. Ng

Keyword(s):

Dimensionality Reduction ◽

Optimization Problem ◽

Augmented Lagrangian ◽

Main Idea ◽

Real Data ◽

Learning Performance ◽

High Dimensional ◽

Data Sets ◽

Outer Loop ◽

Orthogonality Constraints

We study a multi-instance (MI) learning dimensionality-reduction algorithm through sparsity and orthogonality, which is especially useful for high-dimensional MI data sets. We develop a novel algorithm to handle both sparsity and orthogonality constraints that existing methods do not handle well simultaneously. Our main idea is to formulate an optimization problem where the sparse term appears in the objective function and the orthogonality term is formed as a constraint. The resulting optimization problem can be solved by using approximate augmented Lagrangian iterations as the outer loop and inertial proximal alternating linearized minimization (iPALM) iterations as the inner loop. The main advantage of this method is that both sparsity and orthogonality can be satisfied in the proposed algorithm. We show the global convergence of the proposed iterative algorithm. We also demonstrate that the proposed algorithm can achieve high sparsity and orthogonality requirements, which are very important for dimensionality reduction. Experimental results on both synthetic and real data sets show that the proposed algorithm can obtain learning performance comparable to that of other tested MI learning algorithms.

Download Full-text

A Support Based Initialization Algorithm for Categorical Data Clustering

Journal of Information Technology Research ◽

10.4018/jitr.2018040104 ◽

2018 ◽

Vol 11 (2) ◽

pp. 53-67

Author(s):

Ajay Kumar ◽

Shishir Kumar

Keyword(s):

Categorical Data ◽

Selection Process ◽

Numerical Data ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Data Object ◽

Data Points ◽

Wu Method ◽

Selection Algorithms

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.

Download Full-text

Fast effect size shrinkage software for beta-binomial models of allelic imbalance

F1000Research ◽

10.12688/f1000research.20916.2 ◽

2020 ◽

Vol 8 ◽

pp. 2024

Author(s):

Joshua P. Zitovsky ◽

Michael I. Love

Keyword(s):

Allelic Imbalance ◽

Real Data ◽

Shrinkage Estimators ◽

Data Set ◽

Bayesian Shrinkage ◽

In Cis ◽

Posterior Estimation ◽

Binomial Models ◽

Better Than ◽

Diploid Organism

Allelic imbalance occurs when the two alleles of a gene are differentially expressed within a diploid organism and can indicate important differences in cis-regulation and epigenetic state across the two chromosomes. Because of this, the ability to accurately quantify the proportion at which each allele of a gene is expressed is of great interest to researchers. This becomes challenging in the presence of small read counts and/or sample sizes, which can cause estimators for allelic expression proportions to have high variance. Investigators have traditionally dealt with this problem by filtering out genes with small counts and samples. However, this may inadvertently remove important genes that have truly large allelic imbalances. Another option is to use pseudocounts or Bayesian estimators to reduce the variance. To this end, we evaluated the accuracy of four different estimators, the latter two of which are Bayesian shrinkage estimators: maximum likelihood, adding a pseudocount to each allele, approximate posterior estimation of GLM coefficients (apeglm) and adaptive shrinkage (ash). We also wrote C++ code to quickly calculate ML and apeglm estimates and integrated it into the apeglm package. The four methods were evaluated on two simulations and one real data set. Apeglm consistently performed better than ML according to a variety of criteria, and generally outperformed use of pseudocounts as well. Ash also performed better than ML in one of the simulations, but in the other performance was more mixed. Finally, when compared to five other packages that also fit beta-binomial models, the apeglm package was substantially faster and more numerically reliable, making our package useful for quick and reliable analyses of allelic imbalance. Apeglm is available as an R/Bioconductor package at http://bioconductor.org/packages/apeglm.

Download Full-text

Extreme Learning Machine with sigmoid activation function on large data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1433.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3523-3526

Keyword(s):

Efficient Algorithm ◽

Large Data ◽

Activation Function ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Learning Machine ◽

Sigmoid Activation Function ◽

State Of Art ◽

Better Than

This paper describes an efficient algorithm for classification in large data set. While many algorithms exist for classification, they are not suitable for larger contents and different data sets. For working with large data sets various ELM algorithms are available in literature. However the existing algorithms using fixed activation function and it may lead deficiency in working with large data. In this paper, we proposed novel ELM comply with sigmoid activation function. The experimental evaluations demonstrate the our ELM-S algorithm is performing better than ELM,SVM and other state of art algorithms on large data sets.

Download Full-text

Extended Odd Fréchet-G Family of Distributions

Journal of Probability and Statistics ◽

10.1155/2018/2931326 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 6

Author(s):

Suleman Nasiru

Keyword(s):

Maximum Likelihood Method ◽

Real Data ◽

Likelihood Method ◽

Data Sets ◽

Data Set ◽

New Class ◽

New Family ◽

Rate Functions ◽

The Given ◽

Family Of Distributions

The need to develop generalizations of existing statistical distributions to make them more flexible in modeling real data sets is vital in parametric statistical modeling and inference. Thus, this study develops a new class of distributions called the extended odd Fréchet family of distributions for modifying existing standard distributions. Two special models named the extended odd Fréchet Nadarajah-Haghighi and extended odd Fréchet Weibull distributions are proposed using the developed family. The densities and the hazard rate functions of the two special distributions exhibit different kinds of monotonic and nonmonotonic shapes. The maximum likelihood method is used to develop estimators for the parameters of the new class of distributions. The application of the special distributions is illustrated by means of a real data set. The results revealed that the special distributions developed from the new family can provide reasonable parametric fit to the given data set compared to other existing distributions.

Download Full-text

Novel Multi-Level Dynamic Traffic Load-Balancing Protocol for Data Center

Symmetry ◽

10.3390/sym11020145 ◽

2019 ◽

Vol 11 (2) ◽

pp. 145 ◽

Cited By ~ 2

Author(s):

Sheeba Memon ◽

Jiawei Huang ◽

Hussain Saajid ◽

Naadiya Khuda Bux ◽

Arshad Saleem ◽

...

Keyword(s):

Load Balancing ◽

Data Center ◽

State Of The Art ◽

Traffic Load ◽

Parameter Setting ◽

Dynamic Traffic ◽

Typical Data ◽

Fixed Parameter ◽

Multi Level ◽

Traffic Load Balancing

Typically, the production data centers function with various risk factors, such as for instance the network dynamicity, topological asymmetry, and switch failures. Hence, the load-balancing schemes should consider the sensing accurate path circumstances as well as the reduction of failures. However, under dynamic traffic, current load-balancing schemes use the fixed parameter setting, resulting in suboptimal performances. Therefore, we propose a multi-level dynamic traffic load-balancing (MDTLB) protocol, which uses an adaptive approach of parameter setting. The simulation results show that the MDTLB outperforms the state-of-the-art schemes in terms of both the flow completion time and throughput in typical data center applications.

Download Full-text

Estimating Intersection Control Delay Using Large Data Sets of Travel Time from a Global Positioning System

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198105191700103 ◽

2005 ◽

Vol 1917 (1) ◽

pp. 18-27

Author(s):

Brian Hoeschen ◽

Darcy Bullock ◽

Mark Schlappi

Keyword(s):

Travel Time ◽

Traffic Engineering ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Control Delay ◽

Diverse Data ◽

Intersection Control ◽

Better Than

Historically, stopped delay was used to characterize the operation of intersection movements because it was relatively easy to measure. During the past decade, the traffic engineering community has moved away from using stopped delay and now uses control delay. That measurement is more precise but quite difficult to extract from large data sets if strict definitions are used to derive the data. This paper evaluates two procedures for estimating control delay. The first is based on a historical approximation that control delay is 30% larger than stopped delay. The second is new and based on segment delay. The procedures are applied to a diverse data set collected in Phoenix, Arizona, and compared with control delay calculated by using the formal definition. The new approximation was observed to be better than the historical stopped delay procedure; it provided an accurate prediction of control delay. Because it is an approximation, this methodology would be most appropriately applied to large data sets collected from travel time studies for ranking and prioritizing intersections for further analysis.

Download Full-text

Subtractive Clustering and Particle Swarm Optimization Based Fuzzy Classifier

International Journal of Fuzzy System Applications ◽

10.4018/ijfsa.2019070105 ◽

2019 ◽

Vol 8 (3) ◽

pp. 108-122 ◽

Cited By ~ 1

Author(s):

Halima Salah ◽

Mohamed Nemissi ◽

Hamid Seridi ◽

Herman Akdag

Keyword(s):

Particle Swarm Optimization ◽

State Of The Art ◽

Particle Swarm ◽

Main Idea ◽

Fuzzy Rule ◽

Rule Base ◽

Subtractive Clustering ◽

Data Sets ◽

Fuzzy Classifier ◽

Swarm Optimization

Setting a compact and accurate rule base constitutes the principal objective in designing fuzzy rule-based classifiers. In this regard, the authors propose a designing scheme based on the combination of the subtractive clustering (SC) and the particle swarm optimization (PSO). The main idea relies on the application of the SC on each class separately and with a different radius in order to generate regions that are more accurate, and to represent each region by a fuzzy rule. However, the number of rules is then affected by the radiuses, which are the main preset parameters of the SC. The PSO is therefore used to define the optimal radiuses. To get good compromise accuracy-compactness, the authors propose using a multi-objective function for the PSO. The performances of the proposed method are tested on well-known data sets and compared with several state-of-the-art methods.

Download Full-text