Efficient Hyperparameter Tuning with Grid Search for Text Categorization using kNN Approach with BM25 Similarity

Raji Ghawi; Jürgen Pfeffer

doi:10.1515/comp-2019-0011

Efficient Hyperparameter Tuning with Grid Search for Text Categorization using kNN Approach with BM25 Similarity

Open Computer Science ◽

10.1515/comp-2019-0011 ◽

2019 ◽

Vol 9 (1) ◽

pp. 160-180 ◽

Cited By ~ 5

Author(s):

Raji Ghawi ◽

Jürgen Pfeffer

Keyword(s):

Machine Learning ◽

Text Categorization ◽

Learning Algorithm ◽

Efficient Technique ◽

Grid Search ◽

Order Of Magnitude ◽

Speed Up

AbstractIn machine learning, hyperparameter tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. Several approaches have been widely adopted for hyperparameter tuning, which is typically a time consuming process. We propose an efficient technique to speed up the process of hyperparameter tuning with Grid Search. We applied this technique on text categorization using kNN algorithm with BM25 similarity, where three hyperparameters need to be tuned. Our experiments show that our proposed technique is at least an order of magnitude faster than conventional tuning.

Download Full-text

Active learning of deep surrogates for PDEs: application to metasurface design

npj Computational Materials ◽

10.1038/s41524-020-00431-2 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Raphaël Pestourie ◽

Youssef Mroueh ◽

Thanh V. Nguyen ◽

Payel Das ◽

Steven G. Johnson

Keyword(s):

Machine Learning ◽

Active Learning ◽

Large Scale ◽

Learning Algorithm ◽

Engineering Optimization ◽

Optical Surface ◽

Optical Wavelength ◽

Random Samples ◽

Training Cost ◽

Order Of Magnitude

Abstract Surrogate models for partial differential equations are widely used in the design of metamaterials to rapidly evaluate the behavior of composable components. However, the training cost of accurate surrogates by machine learning can rapidly increase with the number of variables. For photonic-device models, we find that this training becomes especially challenging as design regions grow larger than the optical wavelength. We present an active-learning algorithm that reduces the number of simulations required by more than an order of magnitude for an NN surrogate model of optical-surface components compared to uniform random samples. Results show that the surrogate evaluation is over two orders of magnitude faster than a direct solve, and we demonstrate how this can be exploited to accelerate large-scale engineering optimization.

Download Full-text

Multi-Instance Learning with MultiObjective Genetic Programming

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch212 ◽

2011 ◽

pp. 1372-1379

Author(s):

Amelia Zafra

Keyword(s):

Machine Learning ◽

Genetic Programming ◽

Learning Community ◽

Text Categorization ◽

Learning Algorithm ◽

Programming Algorithm ◽

Learning Framework ◽

Wide Range ◽

Training Examples

The multiple-instance problem is a difficult machine learning problem that appears in cases where knowledge about training examples is incomplete. In this problem, the teacher labels examples that are sets (also called bags) of instances. The teacher does not label whether an individual instance in a bag is positive or negative. The learning algorithm needs to generate a classifier that will correctly classify unseen examples (i.e., bags of instances). This learning framework is receiving growing attention in the machine learning community and since it was introduced by Dietterich, Lathrop, Lozano-Perez (1997), a wide range of tasks have been formulated as multi-instance problems. Among these tasks, we can cite content-based image retrieval (Chen, Bi, & Wang, 2006) and annotation (Qi and Han, 2007), text categorization (Andrews, Tsochantaridis, & Hofmann, 2002), web index page recommendation (Zhou, Jiang, & Li, 2005; Xue, Han, Jiang, & Zhou, 2007) and drug activity prediction (Dietterich et al., 1997; Zhou & Zhang, 2007). In this chapter we introduce MOG3P-MI, a multiobjective grammar guided genetic programming algorithm to handle multi-instance problems. In this algorithm, based on SPEA2, individuals represent classification rules which make it possible to determine if a bag is positive or negative. The quality of each individual is evaluated according to two quality indexes: sensitivity and specificity. Both these measures have been adapted to MIL circumstances. Computational experiments show that the MOG3P-MI is a robust algorithm for classification in different domains where achieves competitive results and obtain classifiers which contain simple rules which add comprehensibility and simplicity in the knowledge discovery process, being suitable method for solving MIL problems (Zafra & Ventura, 2007).

Download Full-text

Design and performance analysis of energy efficient technique for wireless multimedia sensor networks using machine learning algorithm

2011 World Congress on Information and Communication Technologies ◽

10.1109/wict.2011.6141406 ◽

2011 ◽

Cited By ~ 1

Author(s):

Kibrewerk Akalu ◽

Kumudha Raimond

Keyword(s):

Machine Learning ◽

Sensor Networks ◽

Performance Analysis ◽

Energy Efficient ◽

Learning Algorithm ◽

Efficient Technique ◽

Machine Learning Algorithm ◽

Wireless Multimedia ◽

Multimedia Sensor Networks ◽

And Performance

Download Full-text

A Machine Learning Algorithm in Automated Text Categorization of Legacy Archives

8th International Conference on Soft Computing, Artificial Intelligence and Applications ◽

10.5121/csit.2019.90701 ◽

2019 ◽

Author(s):

Dali Wang ◽

Ying Bai ◽

David Hamblin

Keyword(s):

Machine Learning ◽

Text Categorization ◽

Learning Algorithm ◽

Machine Learning Algorithm

Download Full-text

Performance based Machine Learning Algorithm for Topic Oriented Text Categorization

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1429.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3501-3506

Keyword(s):

Machine Learning ◽

Text Categorization ◽

Opinion Mining ◽

Learning Algorithm ◽

Data Set ◽

The Social ◽

Mining Methods ◽

Newspaper Data ◽

F Measure ◽

The Web

With the growth of societal news on the web, public opinions are given major importance in decision-making. Researchers of text-based mining have made number of evaluations and were diversified using different data mining methods so as to make the conclusions positive, negative and neutral. So, opinions of people are considered to mine the social information as people give superfluous interest to the reports. In this paper the newspaper data set is considered to find the opinion mining to evaluate the sentiment. Sentiment Analysis is used to compute the opinions of people before they judge on a particular issue. Machine Learning is one of the important approaches for analysis of sentiments. Different methods like Naïve Bayes, SVM, Maximum entropy and SLDA are used for classifying the sentiments. Predictions based on precision, f-measure, recall are done to determine which method best suits the classification.

Download Full-text

A hybrid machine learning algorithm for designing quantum experiments

Quantum Machine Intelligence ◽

10.1007/s42484-019-00003-8 ◽

2019 ◽

Vol 1 (1-2) ◽

pp. 5-15 ◽

Cited By ~ 15

Author(s):

L. O’Driscoll ◽

R. Nichols ◽

P. A. Knott

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Photon Number ◽

Quantum States ◽

Cubic Phase ◽

Machine Learning Algorithm ◽

Speed Up ◽

Hybrid Machine ◽

Cat States ◽

Specific Quantum

Abstract We introduce a hybrid machine learning algorithm for designing quantum optics experiments to produce specific quantum states. Our algorithm successfully found experimental schemes to produce all 5 states we asked it to, including Schrödinger cat states and cubic phase states, all to a fidelity of over 96%. Here, we specifically focus on designing realistic experiments, and hence all of the algorithm’s designs only contain experimental elements that are available with current technology. The core of our algorithm is a genetic algorithm that searches for optimal arrangements of the experimental elements, but to speed up the initial search, we incorporate a neural network that classifies quantum states. The latter is of independent interest, as it quickly learned to accurately classify quantum states given their photon number distributions.

Download Full-text

Machine Learning for Conservative-to-Primitive in Relativistic Hydrodynamics

Symmetry ◽

10.3390/sym13112157 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2157

Author(s):

Tobias Dieselhorst ◽

William Cook ◽

Sebastiano Bernuzzi ◽

David Radice

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Equation Of State ◽

Benchmark Problems ◽

Relativistic Hydrodynamics ◽

Root Finding ◽

Order Of Magnitude ◽

Speed Up ◽

The Neural Networks ◽

Hydrodynamics Equations

The numerical solution of relativistic hydrodynamics equations in conservative form requires root-finding algorithms that invert the conservative-to-primitive variables map. These algorithms employ the equation of state of the fluid and can be computationally demanding for applications involving sophisticated microphysics models, such as those required to calculate accurate gravitational wave signals in numerical relativity simulations of binary neutron stars. This work explores the use of machine learning methods to speed up the recovery of primitives in relativistic hydrodynamics. Artificial neural networks are trained to replace either the interpolations of a tabulated equation of state or directly the conservative-to-primitive map. The application of these neural networks to simple benchmark problems shows that both approaches improve over traditional root finders with tabular equation-of-state and multi-dimensional interpolations. In particular, the neural networks for the conservative-to-primitive map accelerate the variable recovery by more than an order of magnitude over standard methods while maintaining accuracy. Neural networks are thus an interesting option to improve the speed and robustness of relativistic hydrodynamics algorithms.

Download Full-text

Applying an existing machine learning algorithm to text categorization

Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing - Lecture Notes in Computer Science ◽

10.1007/3-540-60925-3_58 ◽

1996 ◽

pp. 343-354 ◽

Cited By ~ 14

Author(s):

Isabelle Moulinier ◽

Jean -Gabriel Ganascia

Keyword(s):

Machine Learning ◽

Text Categorization ◽

Learning Algorithm ◽

Machine Learning Algorithm

Download Full-text

HYPER-PARAMETER OPTIMIZATION AND EVALUATION ON SELECTED MACHINE LEARNING ALGORITHM USING HEPATITIS DATASET

FUDMA Journal of Sciences ◽

10.33003/fjs-2021-0502-649 ◽

2021 ◽

Vol 5 (2) ◽

pp. 447-455

Author(s):

Aminat Yusuf ◽

Oyelola Akande

Keyword(s):

Machine Learning ◽

Expert Knowledge ◽

Learning Algorithm ◽

Area Under The Curve ◽

Machine Learning Techniques ◽

Support Vector ◽

Grid Search ◽

Classification Problems ◽

Comparative Performance ◽

Learning Techniques

Despite the popularity and utility of most machine learning techniques, expert knowledge is required in guiding choices about the suitable technique and settings that are good for solving a specific problem. The lack of expert information renders the procedures vulnerable to poor parameter settings. Several of these machine learning techniques configurations are offered under default settings. However, since different classification problems required suitable machine learning techniques, selecting the appropriate technique and tuning its settings are vital works that will rightly improve predictions in terms of reliability and accuracy. This study aims to perform grid search parameters tuning on 5-selected machine learning techniques on hepatitis disease. Comparative performance is drawn side-by-side with the default settings. The experimental results of the five tuning techniques show that using the configurations suggested in our work yield predictions of a greatly sophisticated quality than choice under its default settings. The result proves that tuning parameters of Support Vector Machine via grid search yields the best accuracy outcomes of 90% and has a competitive performance relative towards criteria of precision, recall, accuracy and Area Under the Curve. Present combinations of parameter settings for each of the techniques by identifying ranges of values for each setting that give good Hepatitis disease outcomes

Download Full-text

pNeRF: Parallelized Conversion from Internal to Cartesian Coordinates

10.1101/385450 ◽

2018 ◽

Author(s):

Mohammed AlQuraishi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Molecular Modeling ◽

Reference Frame ◽

Natural Extension ◽

Computational Cost ◽

Cartesian Coordinates ◽

Order Of Magnitude ◽

Speed Up ◽

Coordinate Conversion

ABSTRACTThe conversion of polymer parameterization from internal coordinates (bond lengths, angles, and torsions) to Cartesian coordinates is a fundamental task in molecular modeling, often performed using the Natural Extension Reference Frame (NeRF) algorithm. NeRF can be parallelized to process multiple polymers simultaneously, but is not parallelizable along the length of a single polymer. A mathematically equivalent algorithm, pNeRF, has been derived that is parallelizable along a polymer’s length. Empirical analysis demonstrates an order-of-magnitude speed up using modern GPUs and CPUs. In machine learning-based workflows, in which partial derivatives are backpropagated through NeRF equations and neural network primitives, switching to pNeRF can reduce the fractional computational cost of coordinate conversion from over two-thirds to around 10%. An optimized TensorFlow-based implementation of pNeRF is available on GitHub.

Download Full-text