scholarly journals Binary Matrix Shuffling Filter for Feature Selection in Neuronal Morphology Classification

2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Congwei Sun ◽  
Zhijun Dai ◽  
Hongyan Zhang ◽  
Lanzhi Li ◽  
Zheming Yuan

A prerequisite to understand neuronal function and characteristic is to classify neuron correctly. The existing classification techniques are usually based on structural characteristic and employ principal component analysis to reduce feature dimension. In this work, we dedicate to classify neurons based on neuronal morphology. A new feature selection method named binary matrix shuffling filter was used in neuronal morphology classification. This method, coupled with support vector machine for implementation, usually selects a small amount of features for easy interpretation. The reserved features are used to build classification models with support vector classification and another two commonly used classifiers. Compared with referred feature selection methods, the binary matrix shuffling filter showed optimal performance and exhibited broad generalization ability in five random replications of neuron datasets. Besides, the binary matrix shuffling filter was able to distinguish each neuron type from other types correctly; for each neuron type, private features were also obtained.

Author(s):  
Gang Liu ◽  
Chunlei Yang ◽  
Sen Liu ◽  
Chunbao Xiao ◽  
Bin Song

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.


2016 ◽  
Vol 25 (11) ◽  
pp. 1650143 ◽  
Author(s):  
Jian Wang ◽  
Jian Feng ◽  
Zhiyan Han

Feature selection has become a key step of fault detection. Unfortunately, the class imbalance in the modern semiconductor industry makes feature selection quite challenging. This paper analyzes the challenges and indicates the limitations of the traditional supervised and unsupervised feature selection methods. To cope with the limitations, a new feature selection method named imbalanced support vector data description-radius-recursive feature selection (ISVDD-radius-RFE) is proposed. When selecting features, the ISVDD-radius-RFE has three advantages: (1) ISVDD-radius-RFE is designed to find the most representative feature by finding the real shape of normal samples. (2) ISVDD-radius-RFE can represent the real shape of normal samples more correctly by introducing the discriminant information from fault samples. (3) ISVDD-radius-RFE is optimized for fault detection where the imbalance data is common. The kernel ISVDD-radius-RFE is also described in this paper. The proposed method is demonstrated through its application in the banana set and SECOM dataset. The experimental results confirm ISVDD-radius-RFE and kernel ISVDD-radius-RFE improve the performance of fault detection.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Shuai Zhang ◽  
Renliang Qu ◽  
Pengyan Wang ◽  
Shenghan Wang

Coronavirus disease 2019 (COVID-19) arising from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has resulted in a global pandemic since its first report in December 2019. So far, SARS-CoV-2 nucleic acid detection has been deemed as the golden standard of COVID-19 diagnosis. However, this detection method often leads to false negatives, thus triggering missed COVID-19 diagnosis. Therefore, it is urgent to find new biomarkers to increase the accuracy of COVID-19 diagnosis. To explore new biomarkers of COVID-19 in this study, expression profiles were firstly accessed from the GEO database. On this basis, 500 feature genes were screened by the minimum-redundancy maximum-relevancy (mRMR) feature selection method. Afterwards, the incremental feature selection (IFS) method was used to choose a classifier with the best performance from different feature gene-based support vector machine (SVM) classifiers. The corresponding 66 feature genes were set as the optimal feature genes. Lastly, the optimal feature genes were subjected to GO functional enrichment analysis, principal component analysis (PCA), and protein-protein interaction (PPI) network analysis. All in all, it was posited that the 66 feature genes could effectively classify positive and negative COVID-19 and work as new biomarkers of the disease.


2007 ◽  
Vol 03 (03) ◽  
pp. 331-347 ◽  
Author(s):  
SHASHA LIAO ◽  
MINGHU JIANG

The feature selection is an important part in automatic text classification. In this paper, we use a Chinese semantic dictionary — Hownet to extract the concepts from the word as the feature set, because it can better reflect the meaning of the text. However, as the concept definition in the dictionary sometimes cannot express the word properly, we define the expression power for every sememe and every definition of the word in further process, and define the relation degree between the sememe and the definition. A threshold is set in the sememe tree, the sememe of the little information is filtered, and the words of weak definition are reserved in expression power. By this method, we construct a combined feature set that consists of both sememes and the Chinese words. The values of sememes are given according to their expression power and relation to the word. By comparing seven feature weighing methods in text classification, we propose a CHI-MCOR weighing method according to the weighing theories and classification precision. Experimental result shows that if the words are extracted properly, not only the feature dimension is smaller but also the classification precision is higher. Our method makes a good balance between the features which occur frequently in the corpus and those which only occur in one category, the difference of the classification precision among different categories is small.


2018 ◽  
Vol 30 (5) ◽  
pp. 706-716 ◽  
Author(s):  
Saori Miyajima ◽  
Takayuki Tanaka ◽  
Natsuki Miyata ◽  
Mitsunori Tada ◽  
Masaaki Mochimaru ◽  
...  

As the demand for nursing care services is growing, the physical burden involved in caregiving has drawn widespread attention. To mitigate the physical burden in caregiving, we have to recognize what kind of work and problems are involved in each caregiving task. To identify the problems involved in caregiving, we need to recognize the work and analyze its workload. Aiming to reduce the burden on the waist during caregiving tasks, we are developing inertial sensor suits for measuring the working motions. With the developed method, the burden on the waist is estimated from the waist posture. Considering its use in practical caregiving sites, the number of inertial sensors should be the minimum necessary, which depends on the number of body parts where to measure the posture. In this study, we select the body parts to achieve the two above-mentioned goals: to recognize the work involved in caregiving and capture the waist posture. A support vector machine (SVM) is used to recognize the work. Its conventional method of selecting the features on which to recognize the work only considers the recognition accuracy and does not sufficiently meet the needs for measuring the postures. Therefore, we propose a new feature-selection method, which can evaluate the waist-posture measuring accuracy and can make forward feature selections in the same manner as the conventional wrapper method. We have verified the effectiveness of the proposed method by measuring simple simulated work motions.


2021 ◽  
Vol 11 (7) ◽  
pp. 3273
Author(s):  
Joana Morgado ◽  
Tania Pereira ◽  
Francisco Silva ◽  
Cláudia Freitas ◽  
Eduardo Negrão ◽  
...  

The evolution of personalized medicine has changed the therapeutic strategy from classical chemotherapy and radiotherapy to a genetic modification targeted therapy, and although biopsy is the traditional method to genetically characterize lung cancer tumor, it is an invasive and painful procedure for the patient. Nodule image features extracted from computed tomography (CT) scans have been used to create machine learning models that predict gene mutation status in a noninvasive, fast, and easy-to-use manner. However, recent studies have shown that radiomic features extracted from an extended region of interest (ROI) beyond the tumor, might be more relevant to predict the mutation status in lung cancer, and consequently may be used to significantly decrease the mortality rate of patients battling this condition. In this work, we investigated the relation between image phenotypes and the mutation status of Epidermal Growth Factor Receptor (EGFR), the most frequently mutated gene in lung cancer with several approved targeted-therapies, using radiomic features extracted from the lung containing the nodule. A variety of linear, nonlinear, and ensemble predictive classification models, along with several feature selection methods, were used to classify the binary outcome of wild-type or mutant EGFR mutation status. The results show that a comprehensive approach using a ROI that included the lung with nodule can capture relevant information and successfully predict the EGFR mutation status with increased performance compared to local nodule analyses. Linear Support Vector Machine, Elastic Net, and Logistic Regression, combined with the Principal Component Analysis feature selection method implemented with 70% of variance in the feature set, were the best-performing classifiers, reaching Area Under the Curve (AUC) values ranging from 0.725 to 0.737. This approach that exploits a holistic analysis indicates that information from more extensive regions of the lung containing the nodule allows a more complete lung cancer characterization and should be considered in future radiogenomic studies.


2005 ◽  
Vol 02 (04) ◽  
pp. 353-365 ◽  
Author(s):  
YONGSHENG OU ◽  
HUIHUAN QIAN ◽  
XINYU WU ◽  
YANGSHENG XU

This paper introduces a real-time video surveillance system which can track people and detect human abnormal behaviors. In the blob detection part, an optical flow algorithm for crowd environment is studied experimentally and a comparison study with respect to traditional subtraction approach is carried out. The different approaches in segmentation and tracking enable the system to track persons when they change movement unpredictably in occlusion. We developed two methods for the human abnormal behavior analysis. The first one employs Principal Component Analysis for feature selection and Support Vector Machine for classification of human behaviors. The proposed feature selection method is based on the border information of four consecutive blobs. The second approach computes optical flow to obtain the velocity of each pixel for determining whether a human behavior is normal or not. Both algorithms are successfully developed in crowded environments to detect the following human abnormal behaviors: (1) Running people in a crowded environment; (2) falling down movement while most are walking or standing; (3) a person carrying an abnormal bar in a square; (4) a person waving hand in the crowd. Experimental results demonstrate these two methods are robust in detecting human abnormal behaviors.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Weidong Cheng ◽  
Tianyang Wang ◽  
Weigang Wen ◽  
Jianyong Li ◽  
Robert X. Gao

The selection of fewer or more representative features from multidimensional features is important when the artificial neural network (ANN) algorithm is used as a classifier. In this paper, a new feature selection method called the mean impact variance (MIVAR) method is proposed to determine the feature that is more suitable for classification. Moreover, this method is constructed on the basis of the training process of the ANN algorithm. To verify the effectiveness of the proposed method, the MIVAR value is used to rank the multidimensional features of the bearing fault diagnosis. In detail, (1) 70-dimensional all waveform features are extracted from a rolling bearing vibration signal with four different operating states, (2) the corresponding MIVAR values of all 70-dimensional features are calculated to rank all features, (3) 14 groups of 10-dimensional features are separately generated according to the ranking results and the principal component analysis (PCA) algorithm and a back propagation (BP) network is constructed, and (4) the validity of the ranking result is proven by training this BP network with these seven groups of 10-dimensional features and by comparing the corresponding recognition rates. The results prove that the features with larger MIVAR value can lead to higher recognition rates.


Sign in / Sign up

Export Citation Format

Share Document