scholarly journals Efficient Hyperparameter Tuning with Grid Search for Text Categorization using kNN Approach with BM25 Similarity

2019 ◽  
Vol 9 (1) ◽  
pp. 160-180 ◽  
Author(s):  
Raji Ghawi ◽  
Jürgen Pfeffer

AbstractIn machine learning, hyperparameter tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. Several approaches have been widely adopted for hyperparameter tuning, which is typically a time consuming process. We propose an efficient technique to speed up the process of hyperparameter tuning with Grid Search. We applied this technique on text categorization using kNN algorithm with BM25 similarity, where three hyperparameters need to be tuned. Our experiments show that our proposed technique is at least an order of magnitude faster than conventional tuning.

2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Raphaël Pestourie ◽  
Youssef Mroueh ◽  
Thanh V. Nguyen ◽  
Payel Das ◽  
Steven G. Johnson

Abstract Surrogate models for partial differential equations are widely used in the design of metamaterials to rapidly evaluate the behavior of composable components. However, the training cost of accurate surrogates by machine learning can rapidly increase with the number of variables. For photonic-device models, we find that this training becomes especially challenging as design regions grow larger than the optical wavelength. We present an active-learning algorithm that reduces the number of simulations required by more than an order of magnitude for an NN surrogate model of optical-surface components compared to uniform random samples. Results show that the surrogate evaluation is over two orders of magnitude faster than a direct solve, and we demonstrate how this can be exploited to accelerate large-scale engineering optimization.


Author(s):  
Amelia Zafra

The multiple-instance problem is a difficult machine learning problem that appears in cases where knowledge about training examples is incomplete. In this problem, the teacher labels examples that are sets (also called bags) of instances. The teacher does not label whether an individual instance in a bag is positive or negative. The learning algorithm needs to generate a classifier that will correctly classify unseen examples (i.e., bags of instances). This learning framework is receiving growing attention in the machine learning community and since it was introduced by Dietterich, Lathrop, Lozano-Perez (1997), a wide range of tasks have been formulated as multi-instance problems. Among these tasks, we can cite content-based image retrieval (Chen, Bi, & Wang, 2006) and annotation (Qi and Han, 2007), text categorization (Andrews, Tsochantaridis, & Hofmann, 2002), web index page recommendation (Zhou, Jiang, & Li, 2005; Xue, Han, Jiang, & Zhou, 2007) and drug activity prediction (Dietterich et al., 1997; Zhou & Zhang, 2007). In this chapter we introduce MOG3P-MI, a multiobjective grammar guided genetic programming algorithm to handle multi-instance problems. In this algorithm, based on SPEA2, individuals represent classification rules which make it possible to determine if a bag is positive or negative. The quality of each individual is evaluated according to two quality indexes: sensitivity and specificity. Both these measures have been adapted to MIL circumstances. Computational experiments show that the MOG3P-MI is a robust algorithm for classification in different domains where achieves competitive results and obtain classifiers which contain simple rules which add comprehensibility and simplicity in the knowledge discovery process, being suitable method for solving MIL problems (Zafra & Ventura, 2007).


2019 ◽  
Vol 8 (2S11) ◽  
pp. 3501-3506

With the growth of societal news on the web, public opinions are given major importance in decision-making. Researchers of text-based mining have made number of evaluations and were diversified using different data mining methods so as to make the conclusions positive, negative and neutral. So, opinions of people are considered to mine the social information as people give superfluous interest to the reports. In this paper the newspaper data set is considered to find the opinion mining to evaluate the sentiment. Sentiment Analysis is used to compute the opinions of people before they judge on a particular issue. Machine Learning is one of the important approaches for analysis of sentiments. Different methods like Naïve Bayes, SVM, Maximum entropy and SLDA are used for classifying the sentiments. Predictions based on precision, f-measure, recall are done to determine which method best suits the classification.


2019 ◽  
Vol 1 (1-2) ◽  
pp. 5-15 ◽  
Author(s):  
L. O’Driscoll ◽  
R. Nichols ◽  
P. A. Knott

Abstract We introduce a hybrid machine learning algorithm for designing quantum optics experiments to produce specific quantum states. Our algorithm successfully found experimental schemes to produce all 5 states we asked it to, including Schrödinger cat states and cubic phase states, all to a fidelity of over 96%. Here, we specifically focus on designing realistic experiments, and hence all of the algorithm’s designs only contain experimental elements that are available with current technology. The core of our algorithm is a genetic algorithm that searches for optimal arrangements of the experimental elements, but to speed up the initial search, we incorporate a neural network that classifies quantum states. The latter is of independent interest, as it quickly learned to accurately classify quantum states given their photon number distributions.


Symmetry ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2157
Author(s):  
Tobias Dieselhorst ◽  
William Cook ◽  
Sebastiano Bernuzzi ◽  
David Radice

The numerical solution of relativistic hydrodynamics equations in conservative form requires root-finding algorithms that invert the conservative-to-primitive variables map. These algorithms employ the equation of state of the fluid and can be computationally demanding for applications involving sophisticated microphysics models, such as those required to calculate accurate gravitational wave signals in numerical relativity simulations of binary neutron stars. This work explores the use of machine learning methods to speed up the recovery of primitives in relativistic hydrodynamics. Artificial neural networks are trained to replace either the interpolations of a tabulated equation of state or directly the conservative-to-primitive map. The application of these neural networks to simple benchmark problems shows that both approaches improve over traditional root finders with tabular equation-of-state and multi-dimensional interpolations. In particular, the neural networks for the conservative-to-primitive map accelerate the variable recovery by more than an order of magnitude over standard methods while maintaining accuracy. Neural networks are thus an interesting option to improve the speed and robustness of relativistic hydrodynamics algorithms.


2021 ◽  
Vol 5 (2) ◽  
pp. 447-455
Author(s):  
Aminat Yusuf ◽  
Oyelola Akande

Despite the popularity and utility of most machine learning techniques, expert knowledge is required in guiding choices about the suitable technique and settings that are good for solving a specific problem. The lack of expert information renders the procedures vulnerable to poor parameter settings. Several of these machine learning techniques configurations are offered under default settings. However, since different classification problems required suitable machine learning techniques, selecting the appropriate technique and tuning its settings are vital works that will rightly improve predictions in terms of reliability and accuracy. This study aims to perform grid search parameters tuning on 5-selected machine learning techniques on hepatitis disease. Comparative performance is drawn side-by-side with the default settings. The experimental results of the five tuning techniques show that using the configurations suggested in our work yield predictions of a greatly sophisticated quality than choice under its default settings. The result proves that tuning parameters of Support Vector Machine via grid search yields the best accuracy outcomes of 90% and has a competitive performance relative towards criteria of precision, recall, accuracy and Area Under the Curve. Present combinations of parameter settings for each of the techniques by identifying ranges of values for each setting that give good Hepatitis disease outcomes


2018 ◽  
Author(s):  
Mohammed AlQuraishi

ABSTRACTThe conversion of polymer parameterization from internal coordinates (bond lengths, angles, and torsions) to Cartesian coordinates is a fundamental task in molecular modeling, often performed using the Natural Extension Reference Frame (NeRF) algorithm. NeRF can be parallelized to process multiple polymers simultaneously, but is not parallelizable along the length of a single polymer. A mathematically equivalent algorithm, pNeRF, has been derived that is parallelizable along a polymer’s length. Empirical analysis demonstrates an order-of-magnitude speed up using modern GPUs and CPUs. In machine learning-based workflows, in which partial derivatives are backpropagated through NeRF equations and neural network primitives, switching to pNeRF can reduce the fractional computational cost of coordinate conversion from over two-thirds to around 10%. An optimized TensorFlow-based implementation of pNeRF is available on GitHub.


Sign in / Sign up

Export Citation Format

Share Document