Comparison of Geographical Traceability of Wild and Cultivated Macrohyporia cocos with Different Data Fusion Approaches

Journal of Analytical Methods in Chemistry ◽

10.1155/2021/5818999 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Li Wang ◽

Qinqin Wang ◽

Yuanzhong Wang ◽

Yunmei Wang

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Random Forest ◽

Data Fusion ◽

Economic Value ◽

Principal Component ◽

Component Analysis ◽

Chemical Components ◽

Significant Difference ◽

Geographical Traceability

Poria originated from the dried sclerotium of Macrohyporia cocos is an edible traditional Chinese medicine with high economic value. Due to the significant difference in quality between wild and cultivated M. cocos, this study aimed to trace the origin of the fungus from the perspectives of wild and cultivation. In addition, there were quite limited studies about data fusion, a potential strategy, employed and discussed in the geographical traceability of M. cocos. Therefore, we traced the origin of M. cocos from the perspectives of wild and cultivation using multiple data fusion approaches. Supervised pattern recognition techniques, like partial least squares discriminant analysis (PLS-DA) and random forest, were employed in this study using. Five types of data fusion involving low-, mid-, and high-level data fusion strategies were performed. Two feature extraction approaches including the selecting variables by a random forest-based method—Boruta algorithm and producing principal components by the dimension reduction technique of principal component analysis—were considered in data fusion. The results indicate the following: (1) The difference between wild and cultivated samples did exist in terms of the content analysis of vital chemical components and fingerprint analysis. (2) Wild samples need data fusion to realize the origin traceability, and the accuracy of the validation set was 95.24%. (3) Boruta outperformed principal component analysis (PCA) in feature extraction. (4) The mid-level Boruta PLS-DA model took full advantage of information synergy and showed the best performance. This study proved that both geographical traceability and optimal identification methods of cultivated and wild samples were different, and data fusion was a potential technique in the geographical identification.

Download Full-text

Comparison of Geographical Traceability of Wild and Cultivated Macrohyporia Cocos With Different Data Fusion Approaches

10.21203/rs.3.rs-371769/v1 ◽

2021 ◽

Author(s):

Qinqin Wang ◽

Yuan-Zhong Wang ◽

Yunmei Wang

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Random Forest ◽

Data Fusion ◽

Principal Component ◽

Component Analysis ◽

Level Data ◽

Significant Difference ◽

High Level ◽

Geographical Traceability

Abstract Background Poria originated from the dried sclerotium of Macrohyporia cocos is an edible traditional Chinese medicine with high economic value. Due to the significant difference in quality between wild and cultivated M. cocos, the study aimed to trace the origin of the fungus from the perspectives of wild and cultivation. In addition, there were quite limited studies about data fusion, a potential strategy, employed and discussed in the geographical traceability of M. cocos. Therefore, we traced the origin of M. cocos from the perspectives of wild and cultivation using multiple data fusion approaches. Methods Supervised pattern recognition techniques like partial least squares discriminant analysis (PLS-DA) and random forest, were employed in this study using. Five types of data fusion involving low-, mid- and high-level data fusion strategies were performed. Two feature extraction approaches including the selecting variables by a random forest-based method—Boruta algorithm and producing principal components by the dimension reduction technique of principal component analysis were considered in data fusion. Results (1) the difference of wild and cultivated samples did exist in terms of the content analysis of vital chemical component and fingerprint analysis. (2) the cultivated samples from different origins could be easily identified by Fourier transform infrared spectroscopy or liquid chromatography, while the wild required data fusion. (3) Boruta outperformed principal component analysis (PCA) in feature extraction. (4) Mid-level-Boruta preceded Mid-level-PCA, low-level and high-level data fusion and individual techniques. The Mid-level-Boruta PLS-DA model took full advantage of information synergy and showed the best performance. Conclusions This study proved that both geographical traceability and optimal identification methods of cultivated and wild samples were different, and data fusion was a potential technique in the geographical identification.

Download Full-text

Klasifikasi Sinyal Emg Pada Otot Tungkai Selama Berjalan Menggunakan Random Forest

Jurnal Inotera ◽

10.31572/inotera.vol1.iss1.2016.id7 ◽

2017 ◽

Vol 1 (1) ◽

pp. 51

Author(s):

Darma Setiawan Putra ◽

Adhi Dharma Wibawa ◽

Mauridhi Hery Purnomo

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Random Forest ◽

Gait Cycle ◽

Cross Validation ◽

Vastus Lateralis ◽

Principal Component ◽

Rectus Femoris ◽

Component Analysis ◽

Gastrocnemius Lateralis

Sinyal electromyography (EMG) merupakan suatu sinyal elektrik yang terdapat dalam lapisan otot selama gerakan aktif. Cara orang berjalan ditentukan oleh struktur otot dan tulang sehingga cara berjalan ini adalah unik dan dapat digunakan sebagai data biometrik. Pada penelitian ini, kami mengklasifikasi data EMG dari delapan jenis otot tungkai selama percobaan berjalan normal: Rectus Femoris, Vastus Lateralis, Vastus Medialis, Bicep Femoris, Semitendinosus, Gastrocnemius Lateralis, Gastrocnemius Medialis, dan Tibialis Anterior. Enam orang subyek diminta untuk berjalan di laboratorium GaitLab dengan 8 buah elektroda EMG ditempel pada otot mereka. Subyek diminta untuk berjalan sebanyak 1 gait cycle dengan 3 kali pengambilan data. Total dataset EMG untuk klasifikasi adalah sebanyak 18 buah. Metode graph feature extraction dan principal component analysis digunakan untuk ekstraksi fitur data EMG. Metode Random Forest digunakan untuk mengklasifikasi data EMG berdasarkan subyek. Metode pelatihan dan pengujian data EMG menggunakan cross validation (CV). Akurasi klasifikasi yang dihasilkan dengan menggunakan metode graph feature extraction adalah sebesar 88.88% dan metode principal component analysis adalah sebesar 72.22%. Hasil ini menunjukkan bahwa data EMG ketika berjalan dari 8 jenis otot tungkai dapat digunakan untuk identitas biometrik gaya berjalan (gait).

Download Full-text

Towards fine-scale population stratification modeling based on kernel principal component analysis and random forest

Genes & Genomics ◽

10.1007/s13258-021-01057-4 ◽

2021 ◽

Author(s):

Weiwen Zhang ◽

Lianglun Cheng ◽

Guoheng Huang

Keyword(s):

Principal Component Analysis ◽

Random Forest ◽

Population Stratification ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Fine Scale ◽

Scale Population

Download Full-text

Feature Extraction of Loader Operation Based on Kernel Principal Component Analysis

2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP) ◽

10.1109/icsp51882.2021.9408684 ◽

2021 ◽

Author(s):

REN Yu ◽

HUI Ji-zhuang ◽

SHI Ze ◽

ZHANG Ze-yu ◽

Zhang Xu-hui ◽

...

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis

Download Full-text

Two-dimensional principal component analysis based on Schatten p -norm for image feature extraction

Journal of Visual Communication and Image Representation ◽

10.1016/j.jvcir.2015.07.011 ◽

2015 ◽

Vol 32 ◽

pp. 55-62 ◽

Cited By ~ 9

Author(s):

Haishun Du ◽

Qingpu Hu ◽

Manman Jiang ◽

Fan Zhang

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Principal Component ◽

Component Analysis ◽

Image Feature ◽

Two Dimensional ◽

Image Feature Extraction

Download Full-text

Feature Extraction Based on Mixture Probabilistic Kernel Principal Component Analysis

2009 International Forum on Information Technology and Applications ◽

10.1109/ifita.2009.11 ◽

2009 ◽

Author(s):

Zhao Huibo ◽

Pan Quan ◽

Cheng Yongmei

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis

Download Full-text

Effects of Collagen–Glycosaminoglycan Mesh on Gene Expression as Determined by Using Principal Component Analysis-Based Unsupervised Feature Extraction

Polymers ◽

10.3390/polym13234117 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4117

Author(s):

Y-h. Taguchi ◽

Turki Turki

Keyword(s):

Gene Expression ◽

Principal Component Analysis ◽

Feature Extraction ◽

Cell Lines ◽

Time Course ◽

Expression Profiles ◽

Principal Component ◽

Component Analysis ◽

Altered Expression ◽

Unsupervised Feature Extraction

The development of the medical applications for substances or materials that contact cells is important. Hence, it is necessary to elucidate how substances that surround cells affect gene expression during incubation. In the current study, we compared the gene expression profiles of cell lines that were in contact with collagen–glycosaminoglycan mesh and control cells. Principal component analysis-based unsupervised feature extraction was applied to identify genes with altered expression during incubation in the treated cell lines but not in the controls. The identified genes were enriched in various biological terms. Our method also outperformed a conventional methodology, namely, gene selection based on linear regression with time course.

Download Full-text

Feature extraction and reduced-order modelling of nitrogen plasma models using principal component analysis

Computers & Chemical Engineering ◽

10.1016/j.compchemeng.2018.05.012 ◽

2018 ◽

Vol 115 ◽

pp. 504-514 ◽

Cited By ~ 6

Author(s):

Aurélie Bellemans ◽

Gianmarco Aversano ◽

Axel Coussement ◽

Alessandro Parente

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Principal Component ◽

Component Analysis ◽

Nitrogen Plasma ◽

Reduced Order ◽

Reduced Order Modelling

Download Full-text

Heuristic Principal Component Analysis-Based Unsupervised Feature Extraction and Its Application to Bioinformatics

Big Data Analytics in Bioinformatics and Healthcare - Advances in Bioinformatics and Biomedical Engineering ◽

10.4018/978-1-4666-6611-5.ch007 ◽

2015 ◽

pp. 138-162 ◽

Cited By ~ 18

Author(s):

Y-H. Taguchi ◽

Mitsuo Iwadate ◽

Hideaki Umeyama ◽

Yoshiki Murakami ◽

Akira Okamoto

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Principal Component ◽

Component Analysis ◽

Circulating Microrna ◽

Biomarker Identification ◽

Typical Situation ◽

Aberrant Dna Methylation ◽

The Stability ◽

Unsupervised Feature Extraction

Feature Extraction (FE) is a difficult task when the number of features is much larger than the number of samples, although that is a typical situation when biological (big) data is analyzed. This is especially true when FE is stable, independent of the samples considered (stable FE), and is often required. However, the stability of FE has not been considered seriously. In this chapter, the authors demonstrate that Principal Component Analysis (PCA)-based unsupervised FE functions as stable FE. Three bioinformatics applications of PCA-based unsupervised FE—detection of aberrant DNA methylation associated with diseases, biomarker identification using circulating microRNA, and proteomic analysis of bacterial culturing processes—are discussed.

Download Full-text

Applying Weighted PCA on Multiclass Classification for Intrusion Detection

Privacy, Intrusion Detection and Response ◽

10.4018/978-1-60960-836-1.ch009 ◽

2011 ◽

pp. 220-241

Author(s):

Mohsen Moshki ◽

Mehran Garmehi ◽

Peyman Kabiri

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Intrusion Detection ◽

Principal Component ◽

Component Analysis ◽

Multiclass Classification ◽

Extended Version ◽

Number Of Classes

In this chapter, application of Principal Component Analysis (PCA) and one of its extensions on intrusion detection is investigated. This extended version of PCA is modified to cover an important shortcoming of traditional PCA. In order to evaluate these modifications, it is mathematically proved that these modifications are beneficial and later on a known dataset such as the DARPA99 dataset is used to verify results experimentally. To verify this approach, initially the traditional PCA is used to preprocess the dataset. Later on, using a simple classifier such as KNN, the effectiveness of the multiclass classification is studied. In the reported work, instead of traditional PCA, a revised version of PCA named Weighted PCA (WPCA) will be used for feature extraction. The results from applying the aforementioned method to the DARPA99 dataset show that this approach results in better accuracy than the traditional PCA when a number of features are limited, a number of classes are large, and a population of classes is unbalanced. In some situations WPCA outperforms traditional PCA by more than 1% in accuracy.

Download Full-text