scholarly journals Data Reduction in the String Space for Efficient kNN Classification Through Space Partitioning

2020 ◽  
Vol 10 (10) ◽  
pp. 3356 ◽  
Author(s):  
Jose J. Valero-Mas ◽  
Francisco J. Castellanos

Within the Pattern Recognition field, two representations are generally considered for encoding the data: statistical codifications, which describe elements as feature vectors, and structural representations, which encode elements as high-level symbolic data structures such as strings, trees or graphs. While the vast majority of classifiers are capable of addressing statistical spaces, only some particular methods are suitable for structural representations. The kNN classifier constitutes one of the scarce examples of algorithms capable of tackling both statistical and structural spaces. This method is based on the computation of the dissimilarity between all the samples of the set, which is the main reason for its high versatility, but in turn, for its low efficiency as well. Prototype Generation is one of the possibilities for palliating this issue. These mechanisms generate a reduced version of the initial dataset by performing data transformation and aggregation processes on the initial collection. Nevertheless, these generation processes are quite dependent on the data representation considered, being not generally well defined for structural data. In this work we present the adaptation of the generation-based reduction algorithm Reduction through Homogeneous Clusters to the case of string data. This algorithm performs the reduction by partitioning the space into class-homogeneous clusters for then generating a representative prototype as the median value of each group. Thus, the main issue to tackle is the retrieval of the median element of a set of strings. Our comprehensive experimentation comparatively assesses the performance of this algorithm in both the statistical and the string-based spaces. Results prove the relevance of our approach by showing a competitive compromise between classification rate and data reduction.

2021 ◽  
Author(s):  
Francisco J. Castellanos ◽  
Jose J. Valero-Mas ◽  
Jorge Calvo-Zaragoza

AbstractThe k-nearest neighbor (kNN) rule is one of the best-known distance-based classifiers, and is usually associated with high performance and versatility as it requires only the definition of a dissimilarity measure. Nevertheless, kNN is also coupled with low-efficiency levels since, for each new query, the algorithm must carry out an exhaustive search of the training data, and this drawback is much more relevant when considering complex structural representations, such as graphs, trees or strings, owing to the cost of the dissimilarity metrics. This issue has generally been tackled through the use of data reduction (DR) techniques, which reduce the size of the reference set, but the complexity of structural data has historically limited their application in the aforementioned scenarios. A DR algorithm denominated as reduction through homogeneous clusters (RHC) has recently been adapted to string representations but as obtaining the exact median value of a set of string data is known to be computationally difficult, its authors resorted to computing the set-median value. Under the premise that a more exact median value may be beneficial in this context, we, therefore, present a new adaptation of the RHC algorithm for string data, in which an approximate median computation is carried out. The results obtained show significant improvements when compared to those of the set-median version of the algorithm, in terms of both classification performance and reduction rates.


Author(s):  
A.G. Aganbegyan ◽  

A.G. Aganbegyan considers the knowledge economy as the main component of human capital. He analyzes certain areas of the knowledge economy (R&D, healthcare and education) by comparing relevant results demonstrated by Russian regions with similar indices reported for other countries. The article points out positive and negative aspects, e.g. high level and quality of education v. low efficiency of its application; lower cancer mortality rate and particularly child mortality rate v. high mortality from cardiovascular disease among working-age population, etc. Major causes of negative phenomena include insufficient funding of the public sector and inefficient administration. In order to remedy the situation, the author recommends the development of a new federal budget, transition to national economic planning and adjustment of national projects.


Author(s):  
Kia Ng

This chapter describes an optical document imaging system to transform paper-based music scores and manuscripts into machine-readable format and a restoration system to touch-up small imperfections (for example broken stave lines and stems), to restore deteriorated master copy for reprinting. The chapter presents a brief background of this field, discusses the main obstacles, and presents the processes involved for printed music scores processing; using a divide-and-conquer approach to sub-segment compound musical symbols (e.g., chords) and inter-connected groups (e.g., beamed quavers) into lower-level graphical primitives (e.g., lines and ellipses) before recognition and reconstruction. This is followed by discussions on the developments of a handwritten manuscripts prototype with a segmentation approach to separate handwritten musical primitives. Issues and approaches for recognition, reconstruction and revalidation using basic music syntax and high-level domain knowledge, and data representation are also presented.


2008 ◽  
Vol 18 (03) ◽  
pp. 195-205 ◽  
Author(s):  
WEIBAO ZOU ◽  
ZHERU CHI ◽  
KING CHUEN LO

Image classification is a challenging problem in organizing a large image database. However, an effective method for such an objective is still under investigation. A method based on wavelet analysis to extract features for image classification is presented in this paper. After an image is decomposed by wavelet, the statistics of its features can be obtained by the distribution of histograms of wavelet coefficients, which are respectively projected onto two orthogonal axes, i.e., x and y directions. Therefore, the nodes of tree representation of images can be represented by the distribution. The high level features are described in low dimensional space including 16 attributes so that the computational complexity is significantly decreased. 2800 images derived from seven categories are used in experiments. Half of the images were used for training neural network and the other images used for testing. The features extracted by wavelet analysis and the conventional features are used in the experiments to prove the efficacy of the proposed method. The classification rate on the training data set with wavelet analysis is up to 91%, and the classification rate on the testing data set reaches 89%. Experimental results show that our proposed approach for image classification is more effective.


Author(s):  
M.G. Diskin ◽  
T.G. McEvoy ◽  
J.M. Sreenan

In suckler beef production it is estimated that 55% of the total cost is required to maintain and replace the breeding females while only 1095 of total feed energy intake is stored in the tissue of the calves and cows. The low reproductive rate of the cow is primarily responsible for this low efficiency. Even in a well managed herd, weaning rate Is about 0.95 calves per cow per annum or less. It Is frequently hypothesised that increasing litter size by inducing twin calving would increase output, biological and economic efficiency provided few extra Inputs were required. Although twinning may increase the efficiency of beef production, spontaneous twin-calving is frequently associated with an Increased incidence of calving problems, poor calf survival, retained placentae and longer rebreeding intervals. Such problems related to twin-calving cannot be studied unless the frequency of twinning is increased above the levels that occur spontaneously. Embryo transfer can be used to Increase the frequency of twin calving thus allowing a better assessment of the potential to Increase output. A suckler herd, with a high level of twinning, was established to determine the effects of litter size on calving performance and calf survival rates.


2012 ◽  
Vol 15 (2) ◽  
pp. 79-99
Author(s):  
Zbigniew Przygodzki

Human capital and knowledge are most important factors of current development processes, contributing to the innovativeness and competitiveness of the economies. The important role of these factors was underlined also in Europe 2020 Strategy. However, due to immaterial character of investment in human capital and because of the high level of decentralization of human capital development policy, these actions are characterized by a relatively low efficiency. Thus, the aim of this paper is firstly to identify the importance of human capital development policy within EU policies. Secondly, it is to identify and conduct a comparative analysis of national differences in human capital development and to identify points of reference for key measures of the development in question. Thirdly, this paper is to specify models of human capital development policy from the perspective of how much involved local authorities are in its implementation and efficiency.


1989 ◽  
Vol 1989 (1) ◽  
pp. 337-342
Author(s):  
François Merlin ◽  
Christian Bocard ◽  
Gilles Castaing

ABSTRACT A lot of information has been made available for 10 years on the use of dispersants through offshore and meso-scale trials. A state-of-the-art review shows that among the key factors that have been identified, the contact between dispersant and oil is of utmost importance. A better knowledge of this parameter should be taken into account in defining operational procedures, especially when applying dispersants by ship, which is considered to be complementary to aerial spraying. Upon request of the French Navy, a series of meso-scale trials was carried out off Brittany in June 1987, according to the methodology previously used in 1984. Three dispersants were sprayed from a boat. It was concluded that a high level of energy at the sea surface mitigates discrepancies in dispersants’ efficiencies as measured in laboratory tests. Better results were obtained in the case of relatively thick oil slicks. The low efficiency that was measured when treating downwind was attributed to the already-observed herding effect. These complementary results reinforce the actions that have been recently developed to optimize dispersant application by ship:Shipboard equipment for neat dispersant spraying is described. Its main feature is an original nozzle assembly that allows the dispersant to be applied effectively onto the oil at a flow rate that can be widely and very quickly changed according to the estimated oil thickness.An operational treatment procedure is discussed, showing how to map, mark out, prospect and treat oil slicks according to the slick shape, estimated oil thickness, and wind direction.


2014 ◽  
Vol 543-547 ◽  
pp. 4392-4395
Author(s):  
Chuan Zhao ◽  
Guo Ping Cheng

Agriculture is the founding principle for a country which has great significance with national interest and people's livelihood. Some logistics links are stumble on information dissemination and high cost which are also the key factors that restrict agriculture logistics development. Based on the comprehensively summarizing of logistics theories both at home and abroad, the paper analyzes the current condition of logistics and finds the problems, such as the low level of information sharing, cumbersome process, expensive cost, low efficiency and less sensitive reaction to market. Through the investigation of Third-Part Logistics (TPL), the paper proposes the agriculture service model with Advanced IT tech upon TPL. In the end, through analysis and evaluation on the model, the paper concludes its advantages, like perfect service functions, strategic alliance with customers, altogether shedding risks, big profit margins, and high level information sharing.


Author(s):  
Hartmut Krain

The paper historically describes the application and development of the centrifugal compressor from the very beginning of its introduction until today. It focuses on selected practical and theoretical examples that — to the author’s opinion — pushed the centrifugal’s standard from simple, low efficiency designs to its current high level status. The main events related with this development like the impact of the industrial revolution and the introduction of jet propulsion are pointed out. The implication of improved theoretical tools becoming available with raising computer capacity and the impetus of advanced measurement techniques on the centrifugal’s improvement are described. A considerable number of references offers the possibility to engross the thoughts.


2021 ◽  
Vol 16 (2) ◽  
pp. 129-136
Author(s):  
Kristina Dmitrievna Kryukova ◽  
Valeriya Olegovna Gresis

One of the most urgent tasks in sugar beet production for Russia today is irregularities in cultivation technology and a low-efficiency crop protection. It leads to a high level of weed infestation of agricultural fields. Developing and identifying the most efficient, selective and accessible herbicides, which have low phytotoxicity, do not have a negative effect on soil chemical characteristics and can be used in sugar beet cultivation is relevant today. The aim of this study was to examine and compare biological efficiency of various doses and concentrations of one- and two-component graminicides on sugar beet crops against the following weeds: Cockspur grass Echinochloa crusgalli (L.) Beauv., Wild millet Setaria glauca (L.) Beauv. and Couch Grass Elytrigia repens (L.) Nevski. The experiment was conducted on the territory of the Tula region in 2020. The total field experiment area was 480 m2. Application of clethodim + quizalofop-P-ethyl (0.5 l/ha) resulted in reducing the number and weight of annual weeds by 6471 %, reducing the number and weight of perennial weeds by 5458 %, which had the same efficiency as clethodim (0.6 l/ha). The efficiency of clethodim + quizalofop-P-ethyl (1.0 l/ha) was higher than Clethodim (0.6 l/ha) and amounted to 7387 % of reduction in the number of weeds compared to the control, but was lower than Clethodim (1.8 l/ha), which resulted in 8995 % reduction in the number of weeds compared to the control. The highest sugar beet yields were obtained in the variants with clethodim (1.8 l/ha) and two-component herbicide (1 l/ha), which amounted to 28 and 25 % yield increase, in comparison with the control.


Sign in / Sign up

Export Citation Format

Share Document