scholarly journals Candidate Feature Extraction and Categorization for Unstructured Text Document

Author(s):  
Prajakta P Shelke ◽  
Aditya A Pardeshi

In the phrases words contains crucial information which helps in feature extraction process. The established techniques for such has huge problem and has limitations in feature extraction process and also it ignores the grammatical structure for the phrases. So results as poor features get extracted. So to overcome this problem a system is proposed which is based on generation of parse tree for the input sentence and cut down into sub-tree subsequently. The branches of the tree are extracted using part-of-speech (POS) labelling intended for candidate phrase. To stay away from redundant phrases filtering is recommended. Finally machine learning is used for the Feature categorization progression. The result illustrates the effectiveness of the approach.

2020 ◽  
Vol 8 (5) ◽  
pp. 2266-2276 ◽  

In earlier days, people used speech as a means of communication or the way a listener is conveyed by voice or expression. But the idea of machine learning and various methods are necessary for the recognition of speech in the matter of interaction with machines. With a voice as a bio-metric through use and significance, speech has become an important part of speech development. In this article, we attempted to explain a variety of speech and emotion recognition techniques and comparisons between several methods based on existing algorithms and mostly speech-based methods. We have listed and distinguished speaking technologies that are focused on specifications, databases, classification, feature extraction, enhancement, segmentation and process of Speech Emotion recognition in this paper


Electrocardiogram (ECG) is the analysis of the electrical movement of the heart over a period of time. The detailed information about the condition of the heart is measured by analyzing the ECG signal. Wavelet transform, fast Fourier transform are the different methods to disorganize cardiac disease. The paper elaborates the survey on ECG signal analysis and related study on arrhythmic and non arrhythmic data. Here we discuss the efficient feature extraction process for electrocardiogram, where based on position and priority six best P-QRS-T fragments are studied. This survey examines the the outcome of the system by using various Machine learning classification algorithms for feature extraction and analysis of ECG Signals. Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN) are the most important algorithms used here for this purpose. There are several publicly available data sets which are used for arrhythmia analysis and among them MIT-BIH ECG-ID database is mostly used. The drawbacks and limitations are also discussed here and from there future challenges and concluding remarks can be done.


Author(s):  
Anindita Das Bhattacharjee

Accessibility problem is relevant for audiovisual information, where enormous data has to be explored and processed. Most of the solutions for this specific type of problems point towards a regular need of extracting applicable information features for a given content domain. And feature extraction process deals with two complicated tasks first deciding and then extracting. There are certain properties expected from good features-Repeatability, Distinctiveness, Locality, Quantity, Accuracy, Efficiency, and Invariance. Different feature extraction techniques are described. The chapter concentrates of taking a survey on the topic of Feature extraction and Image formation. Here both image and video are considered to have their feature extracted. In machine learning, pattern recognition and in image processing has significant contribution. The feature extraction is one of the common mechanisms involved in these two techniques. Extracting feature initiates from an initial data set of measured data and constructs derived informative values which are non redundant in nature.


Healthcare ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. 34
Author(s):  
Sabyasachi Chakraborty ◽  
Satyabrata Aich ◽  
Hee-Cheol Kim

Parkinson’s disease is caused due to the progressive loss of dopaminergic neurons in the substantia nigra pars compacta (SNc). Presently, with the exponential growth of the aging population across the world the number of people being affected by the disease is also increasing and it imposes a huge economic burden on the governments. However, to date, no therapy or treatment has been found that can completely eradicate the disease. Therefore, early detection of Parkinson’s disease is very important so that the progressive loss of dopaminergic neurons can be controlled to provide the patients with a better life. In this study, 3T T1-MRI scans were collected from 906 subjects, out of which, 203 are control subjects, 66 are prodromal subjects and 637 are Parkinson’s disease patients. To analyze the MRI scans for the detection of neurodegeneration and Parkinson’s disease, eight subcortical structures were segmented from the acquired MRI scans using atlas based segmentation. Further, on the extracted eight subcortical structures, feature extraction was performed to extract textural, morphological and statistical features, respectively. After the feature extraction process, an exhaustive set of 107 features were generated for each MRI scan. Therefore, a two-level feature extraction process was implemented for finding the best possible feature set for the detection of Parkinson’s disease. The two-level feature extraction procedure leveraged correlation analysis and recursive feature elimination, which at the end provided us with 20 best performing features out of the extracted 107 features. Further, all the features were trained using machine learning algorithms and a comparative analysis was performed between four different machine learning algorithms based on the selected performance metrics. And at the end, it was observed that artificial neural network (multi-layer perceptron) performed the best by providing an overall accuracy of 95.3%, overall recall of 95.41%, overall precision of 97.28% and f1-score of 94%, respectively.


2020 ◽  
Vol 26 (1) ◽  
pp. 1-9
Author(s):  
Aditya Kakde ◽  
Nitin Arora ◽  
Durgansh Sharma ◽  
Subhash Chander Sharma

AbstractAccording to the Google I/O 2018 key notes, in future artificial intelligence, which also includes machine learning and deep learning, will mostly evolve in healthcare domain. As there are lots of subdomains which come under the category of healthcare domain, the proposed paper concentrates on one such domain, that is breast cancer and pneumonia. Today, just classifying the diseases is not enough. The system should also be able to classify a particular patient’s disease. Thus, this paper shines the light on the importance of multi spectral classification which means the collection of several monochrome images of the same scene. It can be proved to be an important process in the healthcare areas to know if a patient is suffering from a specific disease or not. The convolutional layer followed by the pooling layer is used for the feature extraction process and for the classification process; fully connected layers followed by the regression layer are used.


Author(s):  
Vahid R. Sabzevari ◽  
Asad Azemi ◽  
Morteza Khademi ◽  
Hossein Gholizade ◽  
Armin Kiani ◽  
...  

The goal of this article is to optimize the feature extraction process by using ICA and wavelet transform, apply the obtained set to several different machine learning schemes, and compare their performances. The article is structured as follows. Section 2.0 describes our proposed method for cardiac arrhythmias detection. Section 3.0 covers an overview of different classifier types that were used in this work. Sections 4.0 and 5.0 summarize our simulation scheme and results. Finally, section 6.0 presents the concluding remarks.


2021 ◽  
Vol 40 (1) ◽  
Author(s):  
Matthew Konnik ◽  
Bahar Ahmadi ◽  
Nicholas May ◽  
Joseph Favata ◽  
Zahra Shahbazi ◽  
...  

AbstractX-ray computed tomography (CT) is a powerful technique for non-destructive volumetric inspection of objects and is widely used for studying internal structures of a large variety of sample types. The raw data obtained through an X-ray CT practice is a gray-scale 3D array of voxels. This data must undergo a geometric feature extraction process before it can be used for interpretation purposes. Such feature extraction process is conventionally done manually, but with the ever-increasing trend of image data sizes and the interest in identifying more miniature features, automated feature extraction methods are sought. Given the fact that conventional computer-vision-based methods, which attempt to segment images into partitions using techniques such as thresholding, are often only useful for aiding the manual feature extraction process, machine-learning based algorithms are becoming popular to develop fully automated feature extraction processes. Nevertheless, the machine-learning algorithms require a huge pool of labeled data for proper training, which is often unavailable. We propose to address this shortage, through a data synthesis procedure. We will do so by fabricating miniature features, with known geometry, position and orientation on thin silicon wafer layers using a femtosecond laser machining system, followed by stacking these layers to construct a 3D object with internal features, and finally obtaining the X-ray CT image of the resulting 3D object. Given that the exact geometry, position and orientation of the fabricated features are known, the X-ray CT image is inherently labeled and is ready to be used for training the machine learning algorithms for automated feature extraction. Through several examples, we will showcase: (1) the capability of synthesizing features of arbitrary geometries and their corresponding labeled images; and (2) use of the synthesized data for training machine-learning based shape classifiers and features parameter extractors.


2018 ◽  
Vol 6 (2) ◽  
pp. 254-274
Author(s):  
Lukie Perdanasari ◽  
Riyanto Sigit ◽  
Achmad Basuki

It is important that a company uses the right means to recruit employees with certain personal characteristics as needed. Nowadays, the techniques to respond to psychological tests on people’s characteristics have been widely understood by most job applicants, so that it is difficult to know their true personality. Graphology is a way to identify a person’s characteristics by analyzing the handwriting from the document text made by the applicant. The two types of text document of each applicant are obtained from people of different ages and different writing times. The methods of graphology used in this research for identifying the handwriting are preprocessing and feature extraction. The preprocessing method uses projection integrals, shear transformations, and template matching. While the feature extraction process applies 10 features, they are, margins, line spacing, space between words, size of writing, style, zone, direction of writing, slope of writing, width of writing and shape of the letter. The result of the experiment from five writers shows the accuracy of writing identification equals to 82%, while personality identification equals to 67,4%.


2020 ◽  
pp. 1-12
Author(s):  
Li Dongmei

English text-to-speech conversion is the key content of modern computer technology research. Its difficulty is that there are large errors in the conversion process of text-to-speech feature recognition, and it is difficult to apply the English text-to-speech conversion algorithm to the system. In order to improve the efficiency of the English text-to-speech conversion, based on the machine learning algorithm, after the original voice waveform is labeled with the pitch, this article modifies the rhythm through PSOLA, and uses the C4.5 algorithm to train a decision tree for judging pronunciation of polyphones. In order to evaluate the performance of pronunciation discrimination method based on part-of-speech rules and HMM-based prosody hierarchy prediction in speech synthesis systems, this study constructed a system model. In addition, the waveform stitching method and PSOLA are used to synthesize the sound. For words whose main stress cannot be discriminated by morphological structure, label learning can be done by machine learning methods. Finally, this study evaluates and analyzes the performance of the algorithm through control experiments. The results show that the algorithm proposed in this paper has good performance and has a certain practical effect.


Sign in / Sign up

Export Citation Format

Share Document