Severity Assessment of COVID-19 Using a CT-Based Radiomics Model

Stem Cells International ◽

10.1155/2021/2263469 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Zhigao Xu ◽

Lili Zhao ◽

Guoqiang Yang ◽

Ying Ren ◽

Jinlong Wu ◽

...

Keyword(s):

Large Scale ◽

Operating Characteristic ◽

Fixed Ratio ◽

Roc Curves ◽

Classification Model ◽

Training Dataset ◽

Support Vector ◽

Svm Classifier ◽

Test Dataset ◽

Ct Features

The coronavirus disease of 2019 (COVID-19) has evolved into a worldwide pandemic. Although CT is sensitive in detecting lesions and assessing their severity, these works mainly depend on radiologists’ subjective judgment, which is inefficient in case of a large-scale outbreak. This work focuses on developing a CT-based radiomics model to assess whether COVID-19 patients are in the early, progressive, severe, or absorption stages of the disease. We retrospectively analyzed the CT images of 284 COVID-19 patients. All of the patients were divided into four groups (0-3): early ( n = 75 ), progressive ( n = 58 ), severe ( n = 75 ), and absorption ( n = 76 ) groups, according to the progression of the disease and the CT features. Meanwhile, they were split randomly to training and test datasets with the fixed ratio of 7 : 3 in each category. Thirty-eight radiomic features were nominated from 1688 radiomic features after using select K -best method and the ElasticNet algorithm. On this basis, a support vector machine (SVM) classifier was trained to build this model. Receiver operating characteristic (ROC) curves were generated to determine the diagnostic performance of various models. The precision, recall, and f 1 -score of the classification model of macro- and microaverage were 0.82, 0.82, 0.81, 0.81, 0.81, and 0.81 for the training dataset and 0.75, 0.73, 0.73, 0.72, 0.72, and 0.72 for the test dataset. The AUCs for groups 0, 1, 2, and 3 on the training dataset were 0.99, 0.97, 0.96, and 0.93, and the microaverage AUC was 0.97 with a macroaverage AUC of 0.97. On the test dataset, AUCs for each group were 0.97, 0.86, 0.83, and 0.89 and the microaverage AUC was 0.89 with a macroaverage AUC of 0.90. The CT-based radiomics model proved efficacious in assessing the severity of COVID-19.

Download Full-text

Streamlining Quality Review of Mass Spectrometry Data in the Clinical Laboratory by Use of Machine Learning

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2018-0238-oa ◽

2019 ◽

Vol 143 (8) ◽

pp. 990-998 ◽

Cited By ~ 2

Author(s):

Min Yu ◽

Lindsay A. L. Bazydlo ◽

David E. Bruns ◽

James H. Harrison

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Turnaround Time ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Test Dataset ◽

Manual Review

Context.— Turnaround time and productivity of clinical mass spectrometric (MS) testing are hampered by time-consuming manual review of the analytical quality of MS data before release of patient results. Objective.— To determine whether a classification model created by using standard machine learning algorithms can verify analytically acceptable MS results and thereby reduce manual review requirements. Design.— We obtained retrospective data from gas chromatography–MS analyses of 11-nor-9-carboxy-delta-9-tetrahydrocannabinol (THC-COOH) in 1267 urine samples. The data for each sample had been labeled previously as either analytically unacceptable or acceptable by manual review. The dataset was randomly split into training and test sets (848 and 419 samples, respectively), maintaining equal proportions of acceptable (90%) and unacceptable (10%) results in each set. We used stratified 10-fold cross-validation in assessing the abilities of 6 supervised machine learning algorithms to distinguish unacceptable from acceptable assay results in the training dataset. The classifier with the highest recall was used to build a final model, and its performance was evaluated against the test dataset. Results.— In comparison testing of the 6 classifiers, a model based on the Support Vector Machines algorithm yielded the highest recall and acceptable precision. After optimization, this model correctly identified all unacceptable results in the test dataset (100% recall) with a precision of 81%. Conclusions.— Automated data review identified all analytically unacceptable assays in the test dataset, while reducing the manual review requirement by about 87%. This automation strategy can focus manual review only on assays likely to be problematic, allowing improved throughput and turnaround time without reducing quality.

Download Full-text

The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification

Iraqi Journal of Science ◽

10.24996/ijs.2020.61.12.28 ◽

2020 ◽

pp. 3397-3407

Author(s):

Nur Syafiqah Mohd Nafis ◽

Suryanti Awang

Keyword(s):

Feature Selection ◽

Text Classification ◽

Training Dataset ◽

Recursive Feature Elimination ◽

High Dimensional ◽

Significant Feature ◽

Support Vector ◽

Svm Classifier ◽

Text Documents ◽

Text Document

Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.

Download Full-text

PROTEIN FOLD CLASSIFICATION WITH GENETIC ALGORITHMS AND FEATURE SELECTION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720009004321 ◽

2009 ◽

Vol 07 (05) ◽

pp. 773-788 ◽

Cited By ~ 2

Author(s):

PENG CHEN ◽

CHUNMEI LIU ◽

LEGAND BURGE ◽

MOHAMMAD MAHMOOD ◽

WILLIAM SOUTHERLAND ◽

...

Keyword(s):

Support Vector Machine ◽

Genetic Algorithms ◽

Feature Selection ◽

Training Dataset ◽

Support Vector ◽

Protein Fold ◽

Classification Rate ◽

Test Dataset ◽

Feature Vectors ◽

Protein Folds

Protein fold classification is a key step to predicting protein tertiary structures. This paper proposes a novel approach based on genetic algorithms and feature selection to classifying protein folds. Our dataset is divided into a training dataset and a test dataset. Each individual for the genetic algorithms represents a selection function of the feature vectors of the training dataset. A support vector machine is applied to each individual to evaluate the fitness value (fold classification rate) of each individual. The aim of the genetic algorithms is to search for the best individual that produces the highest fold classification rate. The best individual is then applied to the feature vectors of the test dataset and a support vector machine is built to classify protein folds based on selected features. Our experimental results on Ding and Dubchak's benchmark dataset of 27-class folds show that our approach achieves an accuracy of 71.28%, which outperforms current state-of-the-art protein fold predictors.

Download Full-text

Automated Detection of Paleoenvironmental Proxy, Eucampia Index, in a Microscopic Slide Using a Convolutional Neural Network System

10.21203/rs.3.rs-88945/v1 ◽

2020 ◽

Author(s):

Saki Ishino ◽

Takuya Itaki

Keyword(s):

Southern Ocean ◽

Large Scale ◽

Classification Performance ◽

Automated Detection ◽

Model Verification ◽

Training Dataset ◽

Test Dataset ◽

Counting Error ◽

Index Value ◽

Particle Images

Abstract The Eucampia Index, which is calculated from valve ratio of Antarctic diatom Eucampia ainarctica varieties, has been expected to be a useful indicator of sea ice coverage or/and sea surface temperature variation in the Southern Ocean. To verify the relationship between the index value and the environmental factors, considerable effort is needed to classify and count valves of E. antarctica in a very large number of samples. In this study, to realize automated detection of the Eucampia Index, we constructed a deep-learning (one of the learning methods of artificial intelligence) based models for identifying Eucampia valves from various particles in a diatom slide. The microfossil Classification and Rapid Accumulation Device (miCRAD) system, which can be used for scanning a slide and cropping images of particles automatically, was employed to collect images in training dataset for the model and test dataset for model verification. As a result of classifying particle images in the test dataset by the initial model "Eant_1000px_200616", accuracy was 78.8%. The Eucampia Index value prepared in the test dataset was 0.80, and the value predicted using the developed model from the same dataset was 0.76. The predicted value was in the range of the manual counting error. These results suggest that the classification performance of the model is similar to that of a human expert. This study revealed that a model capable of detecting the ratio of two diatom species can be constructed using the miCRAD system for the first time. The miCRAD system connected with the developed model in this study is capable of automatically classifying particle images at the same time of capturing images so that the system can be applied to a large-scale analysis of the Eucampia index in the Southern Ocean. Depending on the setting of the classification category, similar method is relevant to investigators who have to process a large number of diatom samples such as for detecting specific species for biostratigraphic and paleoenvironmental studies.

Download Full-text

Ensemble Learning Prediction of Drug-Target Interactions Using GIST Descriptor Extracted from PSSM-Based Evolutionary Information

BioMed Research International ◽

10.1155/2020/4516250 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Xinke Zhan ◽

Zhuhong You ◽

Changqing Yu ◽

Liping Li ◽

Jie Pan

Keyword(s):

Drug Target ◽

Large Scale ◽

Target Pair ◽

Evolutionary Information ◽

Support Vector ◽

Svm Classifier ◽

New Drug ◽

Golden Standard ◽

Scoring Matrix ◽

G Protein Coupled

Identifying the drug-target interactions (DTIs) plays an essential role in new drug development. However, there still has the limited knowledge of DTIs and a significant number of unknown DTI pairs. Moreover, the traditional experimental methods have inevitable disadvantages such as high cost and time-consuming. Therefore, developing computational methods for predicting DTIs is attracting more and more attention. In this study, we report a novel computational approach for predicting DTI using GIST feature, position-specific scoring matrix (PSSM), and rotation forest (RF). Specifically, each target protein is first converted into a PSSM for retaining evolutionary information. Then, the GIST feature is extracted from PSSM and substructure fingerprint information is adopted to extract the feature of the drug. Finally, combining each protein and drug features to form a new drug-target pair, which is employed as input feature for RF classifier. In the experiment, the proposed method achieves high average accuracies of 89.25%, 85.93%, 82.36%, and 73.89% on enzyme, ion channel, G protein-coupled receptors (GPCRs), and nuclear receptor, respectively. For further evaluating the prediction performance of the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the same golden standard dataset. These promising results illustrate that the proposed method is more effective and stable than other methods. We expect the proposed method to be a useful tool for predicting large-scale DTIs.

Download Full-text

Classification of Hyperspectral In Vivo Brain Tissue Based on Linear Unmixing

Applied Sciences ◽

10.3390/app10165686 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5686

Author(s):

Ines A. Cruz-Guerrero ◽

Raquel Leon ◽

Daniel U. Campos-Delgado ◽

Samuel Ortega ◽

Himar Fabelo ◽

...

Keyword(s):

Brain Tissue ◽

Classification Performance ◽

Training Dataset ◽

Support Vector ◽

Svm Classifier ◽

Tissue Classification ◽

Processing Times ◽

Main Challenge ◽

Linear Unmixing

Hyperspectral imaging is a multidimensional optical technique with the potential of providing fast and accurate tissue classification. The main challenge is the adequate processing of the multidimensional information usually linked to long processing times and significant computational costs, which require expensive hardware. In this study, we address the problem of tissue classification for intraoperative hyperspectral images of in vivo brain tissue. For this goal, two methodologies are introduced that rely on a blind linear unmixing (BLU) scheme for practical tissue classification. Both methodologies identify the characteristic end-members related to the studied tissue classes by BLU from a training dataset and classify the pixels by a minimum distance approach. The proposed methodologies are compared with a machine learning method based on a supervised support vector machine (SVM) classifier. The methodologies based on BLU achieve speedup factors of ~459× and ~429× compared to the SVM scheme, while keeping constant and even slightly improving the classification performance.

Download Full-text

Cooperative Hybrid Semi-Supervised Learning for Text Sentiment Classification

Symmetry ◽

10.3390/sym11020133 ◽

2019 ◽

Vol 11 (2) ◽

pp. 133 ◽

Cited By ~ 2

Author(s):

Yang Li ◽

Ying Lv ◽

Suge Wang ◽

Jiye Liang ◽

Juanzi Li ◽

...

Keyword(s):

Supervised Learning ◽

Large Scale ◽

Ensemble Classifier ◽

Sentiment Classification ◽

Training Dataset ◽

Support Vector ◽

Seed Selection ◽

Training Strategy ◽

Whole Process ◽

Self Learning

A large-scale and high-quality training dataset is an important guarantee to learn an ideal classifier for text sentiment classification. However, manually constructing such a training dataset with sentiment labels is a labor-intensive and time-consuming task. Therefore, based on the idea of effectively utilizing unlabeled samples, a synthetical framework that covers the whole process of semi-supervised learning from seed selection, iterative modification of the training text set, to the co-training strategy of the classifier is proposed in this paper for text sentiment classification. To provide an important basis for selecting the seed texts and modifying the training text set, three kinds of measures—the cluster similarity degree of an unlabeled text, the cluster uncertainty degree of a pseudo-label text to a learner, and the reliability degree of a pseudo-label text to a learner—are defined. With these measures, a seed selection method based on Random Swap clustering, a hybrid modification method of the training text set based on active learning and self-learning, and an alternately co-training strategy of the ensemble classifier of the Maximum Entropy and Support Vector Machine are proposed and combined into our framework. The experimental results on three Chinese datasets (COAE2014, COAE2015, and a Hotel review, respectively) and five English datasets (Books, DVD, Electronics, Kitchen, and MR, respectively) in the real world verify the effectiveness of the proposed framework.

Download Full-text

Sequence-Based Discovery of Antibacterial Peptides Using Ensemble Gradient Boosting

Proceedings ◽

10.3390/proceedings2020066006 ◽

2020 ◽

Vol 66 (1) ◽

pp. 6

Author(s):

Ehdieh Khaledian ◽

Shira L. Broschat

Keyword(s):

Operating Characteristic ◽

Area Under The Curve ◽

Roc Curves ◽

Gradient Boosting ◽

Support Vector ◽

Antibacterial Peptides ◽

Therapeutic Approaches ◽

Laboratory Procedures ◽

Feature Selection Techniques ◽

Better Than

Antimicrobial resistance is driving pharmaceutical companies to investigate different therapeutic approaches. One approach that has garnered growing consideration in drug development is the use of antimicrobial peptides (AMPs). Antibacterial peptides (ABPs), which occur naturally as part of the immune response, can serve as powerful, broad-spectrum antibiotics. However, conventional laboratory procedures for screening and discovering ABPs are expensive and time-consuming. Identification of ABPs can be significantly improved using computational methods. In this paper, we introduce a machine learning method for the fast and accurate prediction of ABPs. We gathered more than 6000 peptides from publicly available datasets and extracted 1209 features (peptide characteristics) from these sequences. We selected the set of optimal features by applying correlation-based and random forest feature selection techniques. Finally, we designed an ensemble gradient boosting model (GBM) to predict putative ABPs. We evaluated our model using receiver operating characteristic (ROC) curves, calculating the area under the curve (AUC) for several different models for comparison, including a recurrent neural network, a support vector machine, and iAMPpred. The AUC for the GBM was ~0.98, more than 3% better than any of the other models.

Download Full-text

The classification of skateboarding tricks via transfer learning pipelines

PeerJ Computer Science ◽

10.7717/peerj-cs.680 ◽

2021 ◽

Vol 7 ◽

pp. e680

Author(s):

Muhammad Amirul Abdullah ◽

Muhammad Ar Rahim Ibrahim ◽

Muhammad Nur Aiman Shapiee ◽

Muhammad Aizzat Zakaria ◽

Mohd Azraai Mohd Razman ◽

...

Keyword(s):

Transfer Learning ◽

Input Image ◽

Computational Time ◽

Support Vector ◽

Svm Classifier ◽

Test Accuracy ◽

Learning Models ◽

Test Dataset ◽

Flat Ground

This study aims at classifying flat ground tricks, namely Ollie, Kickflip, Shove-it, Nollie and Frontside 180, through the identification of significant input image transformation on different transfer learning models with optimized Support Vector Machine (SVM) classifier. A total of six amateur skateboarders (20 ± 7 years of age with at least 5.0 years of experience) executed five tricks for each type of trick repeatedly on a customized ORY skateboard (IMU sensor fused) on a cemented ground. From the IMU data, a total of six raw signals extracted. A total of two input image type, namely raw data (RAW) and Continous Wavelet Transform (CWT), as well as six transfer learning models from three different families along with grid-searched optimized SVM, were investigated towards its efficacy in classifying the skateboarding tricks. It was shown from the study that RAW and CWT input images on MobileNet, MobileNetV2 and ResNet101 transfer learning models demonstrated the best test accuracy at 100% on the test dataset. Nonetheless, by evaluating the computational time amongst the best models, it was established that the CWT-MobileNet-Optimized SVM pipeline was found to be the best. It could be concluded that the proposed method is able to facilitate the judges as well as coaches in identifying skateboarding tricks execution.

Download Full-text

Data Mining Technology Application in False Text Information Recognition

Mobile Information Systems ◽

10.1155/2021/4206424 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Jie Wan ◽

Xue Cao ◽

Kun Yao ◽

Donghui Yang ◽

E. Peng ◽

...

Keyword(s):

Data Mining ◽

Classification Model ◽

Support Vector ◽

Svm Classifier ◽

Characteristic Matrix ◽

Mining Technology ◽

Technology Application ◽

Text Information ◽

The Government ◽

Effect Of The Support

False information on the Internet is being heralded as serious social harm to our society. To recognize false text information, in this paper, an effective method for mining text features is proposed in the field of false drug advertisements. Firstly, the data of false drug advertisements and real drug advertisements were collected from the official websites to build a database of false and real drug advertisements. Secondly, by performing feature extraction on the text of drug advertisements, this work built a characteristic matrix based on the effective features and assigned positive or negative labels to the feature vector of the matrix according to whether it is a fake medical advertisement or not. Thirdly, this study trained and tested several different classifiers, selected the classification model with the best performance in identifying false drug advertisements, and found the key characteristics that can determine the classification. Finally, the model with the best performance was used to predict new false drug advertisements collected from Sina Weibo. In the case of identifying false drug advertisements, the classification effect of the support vector machine (SVM) classifier established on the feature set after feature selection was the most effective. The findings of this study can provide an effective method for the government to identify and combat false advertisements. This study has a certain reference significance in demonstrating the use of text data mining technology to identify and detect information fraud behavior.

Download Full-text