Non-invasive classification of non-small cell lung cancer: a comparison between random forest models utilising radiomic and semantic features

Usman Bashir; Bhavin Kawa; Muhammad Siddique; Sze Mun Mak; Arjun Nair; Emma Mclean; Andrea Bille; Vicky Goh; Gary Cook

doi:10.1259/bjr.20190159

Non-invasive classification of non-small cell lung cancer: a comparison between random forest models utilising radiomic and semantic features

British Journal of Radiology ◽

10.1259/bjr.20190159 ◽

2019 ◽

Vol 92 (1099) ◽

pp. 20190159 ◽

Cited By ~ 4

Author(s):

Usman Bashir ◽

Bhavin Kawa ◽

Muhammad Siddique ◽

Sze Mun Mak ◽

Arjun Nair ◽

...

Keyword(s):

Random Forest ◽

Test Data ◽

Ct Scans ◽

Semantic Features ◽

Small Cell Lung ◽

Data Set ◽

Non Invasive ◽

Forest Models ◽

Random Forest Models

Objective: Non-invasive distinction between squamous cell carcinoma and adenocarcinoma subtypes of non-small-cell lung cancer (NSCLC) may be beneficial to patients unfit for invasive diagnostic procedures or when tissue is insufficient for diagnosis. The purpose of our study was to compare the performance of random forest algorithms utilizing CT radiomics and/or semantic features in classifying NSCLC. Methods: Two thoracic radiologists scored 11 semantic features on CT scans of 106 patients with NSCLC. A set of 115 radiomics features was extracted from the CT scans. Random forest models were developed from semantic (RM-sem), radiomics (RM-rad), and all features combined (RM-all). External validation of models was performed using an independent test data set (n = 100) of CT scans. Model performance was measured with out-of-bag error and area under curve (AUC), and compared using receiver-operating characteristics curve analysis on the test data set. Results: The median (interquartile-range) error rates of the models were: RF-sem 24.5 % (22.6 – 37.5 %), RF-rad 35.8 % (34.9 – 38.7 %), and RM-all 37.7 % (37.7 – 37.7). On training data, both RF-rad and RF-all gave perfect discrimination (AUC = 1), which was significantly higher than that achieved by RF-sem (AUC = 0.78; p < 0.0001). On test data, however, RM-sem model (AUC = 0.82) out-performed RM-rad and RM-all (AUC = 0.5 and AUC = 0.56; p < 0.0001), neither of which was significantly different from random guess ( p = 0.9 and 0.6 respectively). Conclusion: Non-invasive classification of NSCLC can be done accurately using random forest classification models based on well-known CT-derived descriptive features. However, radiomics-based classification models performed poorly in this scenario when tested on independent data and should be used with caution, due to their possible lack of generalizability to new data. Advances in knowledge: Our study describes novel CT-derived random forest models based on radiologist-interpretation of CT scans (semantic features) that can assist NSCLC classification when histopathology is equivocal or when histopathological sampling is not possible. It also shows that random forest models based on semantic features may be more useful than those built from computational radiomic features.

Download Full-text

Smarty Pants: Exploring Textile Pressure Sensors in Trousers for Posture and Behaviour Classification

Proceedings ◽

10.3390/proceedings2019032019 ◽

2019 ◽

Vol 32 (1) ◽

pp. 19

Author(s):

Skach ◽

Stewart ◽

Healey

Keyword(s):

Random Forest ◽

Social Behaviour ◽

Pressure Sensors ◽

Application Area ◽

Body Postures ◽

Textile Sensors ◽

Forest Models ◽

Random Forest Models ◽

Potential Use

In this paper, we introduce a new modality for capturing body postures and social behaviour. Vice versa, we propose a new application area for on-body textile sensors. We have developed “smart trousers” with embedded textile pressure sensors that allow for classification of a large variety of postural movements as well as interactional states. Random Forest models are used to investigate those. Here, we give an overview of the research conducted and discuss potential use cases of the presented design.

Download Full-text

NIMG-42. MP-MRI-BASED TUMOR PROBABILITY MAPS TRAINED USING AUTOPSY TISSUE SAMPLES AS GROUND TRUTH NON-INVASIVELY PREDICT INFILTRATIVE TUMOR BEYOND THE CONTRAST ENHANCING REGION

Neuro-Oncology ◽

10.1093/neuonc/noab196.541 ◽

2021 ◽

Vol 23 (Supplement_6) ◽

pp. vi138-vi138

Author(s):

Samuel Bobholz ◽

Allison Lowman ◽

Michael Brehler ◽

John Sherman ◽

Savannah Duenweg ◽

...

Keyword(s):

Random Forest ◽

Contrast Enhancement ◽

Ground Truth ◽

Bayes Classifier ◽

Data Set ◽

Tissue Samples ◽

Probability Maps ◽

Forest Models ◽

Random Forest Models ◽

Infiltrative Tumor

Abstract Infiltrative glioma beyond contrast enhancement on MRI is often difficult to identify with conventional imaging. In this study, we use large-format autopsy samples aligned to multi-parametric MRI to test the hypothesis that radio-pathomic machine learning models are able to accurately identify areas of infiltrative tumor beyond the contrast enhancing region. At autopsy, 140 tissue samples from 62 brain cancer patients were collected from brain slices sectioned to align with the patients’ last clinical MRI prior to death. Cell, extra-cellular fluid (ECF), and cytoplasm densities were computed from digitized, hematoxylin and eosin-stained samples, and a subset of 20 slides from 9 patients were annotated for tumor presence by a pathologist-trained technician. In-house custom software was used to align the tissue samples to the patients’ last clinical imaging, which included pre- and post-contrast T1, FLAIR, and ADC images. Bagging random forest models were then trained to predict cellularity, ECF, and cytoplasm density using 5-by-5 voxel tiles from each MRI as input. A 2/3-1/3 train-test split was used to validate model generalizability. A naïve Bayes classifier was trained to predict tumor class using cellularity, ECF, and cytoplasm segmentations within the annotation data set, again using a 2/3-1/3 train-test split to validate performance. The random forest models each accurately predicted cellularity, ECF, and cytoplasm density within the test data set, with root-mean-squared error values for each falling within one standard deviation of the ground truth. The histology-based tumor prediction model accurately predicted tumor, with a test set ROC AUC of 0.86. When using whole brain cellularity, ECF, and cytoplasm predictions from the random forest models as inputs for the naïve Bayes classifier, tumor probability maps identified regions of infiltrative tumor beyond contrast enhancement. Our results suggest that radio-pathomic maps of tumor probability accurately identify regions of infiltrative tumor beyond currently accepted MRI signatures.

Download Full-text

Limitations of using surrogates for behaviour classification of accelerometer data: refining methods using random forest models in Caprids

Movement Ecology ◽

10.1186/s40462-021-00265-7 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Eleanor R. Dickinson ◽

Joshua P. Twining ◽

Rory Wilson ◽

Philip A. Stephens ◽

Jennie Westander ◽

...

Keyword(s):

Random Forest ◽

Model Performance ◽

Similar Species ◽

Accelerometer Data ◽

Alpine Ibex ◽

Forest Models ◽

Random Forest Models ◽

In The Wild ◽

Elusive Species

Abstract Background Animal-attached devices can be used on cryptic species to measure their movement and behaviour, enabling unprecedented insights into fundamental aspects of animal ecology and behaviour. However, direct observations of subjects are often still necessary to translate biologging data accurately into meaningful behaviours. As many elusive species cannot easily be observed in the wild, captive or domestic surrogates are typically used to calibrate data from devices. However, the utility of this approach remains equivocal. Methods Here, we assess the validity of using captive conspecifics, and phylogenetically-similar domesticated counterparts (surrogate species) for calibrating behaviour classification. Tri-axial accelerometers and tri-axial magnetometers were used with behavioural observations to build random forest models to predict the behaviours. We applied these methods using captive Alpine ibex (Capra ibex) and a domestic counterpart, pygmy goats (Capra aegagrus hircus), to predict the behaviour including terrain slope for locomotion behaviours of captive Alpine ibex. Results Behavioural classification of captive Alpine ibex and domestic pygmy goats was highly accurate (> 98%). Model performance was reduced when using data split per individual, i.e., classifying behaviour of individuals not used to train models (mean ± sd = 56.1 ± 11%). Behavioural classifications using domestic counterparts, i.e., pygmy goat observations to predict ibex behaviour, however, were not sufficient to predict all behaviours of a phylogenetically similar species accurately (> 55%). Conclusions We demonstrate methods to refine the use of random forest models to classify behaviours of both captive and free-living animal species. We suggest there are two main reasons for reduced accuracy when using a domestic counterpart to predict the behaviour of a wild species in captivity; domestication leading to morphological differences and the terrain of the environment in which the animals were observed. We also identify limitations when behaviour is predicted in individuals that are not used to train models. Our results demonstrate that biologging device calibration needs to be conducted using: (i) with similar conspecifics, and (ii) in an area where they can perform behaviours on terrain that reflects that of species in the wild.

Download Full-text

Behavioral consistency in the digital age

10.31234/osf.io/r5wtn ◽

2021 ◽

Author(s):

Heather Shaw ◽

Paul Taylor ◽

David Alexander Ellis ◽

Stacey Conchie

Keyword(s):

Random Forest ◽

Test Data ◽

Digital Age ◽

Behavioral Consistency ◽

Behavioral Stability ◽

Trait Level ◽

Forest Models ◽

Random Forest Models ◽

Digital Footprints

Efforts to infer personality from digital footprints have focused on behavioral stability at the trait level without considering situational dependency. We repeat Shoda, Mischel, and Wright’s (1994) classic study of intraindividual consistency with data on 28,692 days of smartphone usage by 780 people. Using per app measures of ‘pickup’ frequency and usage duration, we found that profiles of daily smartphone usage were significantly more consistent when taken from the same user than from different users (d > 1.46). Random forest models trained on 6 days of behavior identified each of the 780 users in test data with 35.8% / 38.5% (pickup / duration) accuracy. This increased to 73.5% / 75.3% when success was taken as the user appearing in the top 10 predictions (i.e., top 1%). Thus, situation-dependent stability in behavior is present in our digital lives and its uniqueness provides both opportunities and risks to privacy.

Download Full-text

Data-Driven Wildfire Risk Prediction in Northern California

Atmosphere ◽

10.3390/atmos12010109 ◽

2021 ◽

Vol 12 (1) ◽

pp. 109

Author(s):

Ashima Malik ◽

Megha Rajam Rao ◽

Nandini Puppala ◽

Prathusha Koouri ◽

Venkata Anil Kumar Thota ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Curves ◽

Data Driven ◽

Northern California ◽

Combined Model ◽

Wildfire Risk ◽

Study Results ◽

Forest Models ◽

Random Forest Models

Over the years, rampant wildfires have plagued the state of California, creating economic and environmental loss. In 2018, wildfires cost nearly 800 million dollars in economic loss and claimed more than 100 lives in California. Over 1.6 million acres of land has burned and caused large sums of environmental damage. Although, recently, researchers have introduced machine learning models and algorithms in predicting the wildfire risks, these results focused on special perspectives and were restricted to a limited number of data parameters. In this paper, we have proposed two data-driven machine learning approaches based on random forest models to predict the wildfire risk at areas near Monticello and Winters, California. This study demonstrated how the models were developed and applied with comprehensive data parameters such as powerlines, terrain, and vegetation in different perspectives that improved the spatial and temporal accuracy in predicting the risk of wildfire including fire ignition. The combined model uses the spatial and the temporal parameters as a single combined dataset to train and predict the fire risk, whereas the ensemble model was fed separate parameters that were later stacked to work as a single model. Our experiment shows that the combined model produced better results compared to the ensemble of random forest models on separate spatial data in terms of accuracy. The models were validated with Receiver Operating Characteristic (ROC) curves, learning curves, and evaluation metrics such as: accuracy, confusion matrices, and classification report. The study results showed and achieved cutting-edge accuracy of 92% in predicting the wildfire risks, including ignition by utilizing the regional spatial and temporal data along with standard data parameters in Northern California.

Download Full-text

Incorporating space and time into random forest models for analyzing geospatial patterns of drug-related crime incidents in a major U.S. metropolitan area

Computers Environment and Urban Systems ◽

10.1016/j.compenvurbsys.2021.101599 ◽

2021 ◽

Vol 87 ◽

pp. 101599

Author(s):

Zhiyue Xia ◽

Kathleen Stewart ◽

Junchuan Fan

Keyword(s):

Random Forest ◽

Metropolitan Area ◽

Space And Time ◽

Forest Models ◽

Random Forest Models

Download Full-text

Landslide susceptibility assessment for a transmission line in Gansu Province, China by using a hybrid approach of fractal theory, information value, and random forest models

Environmental Earth Sciences ◽

10.1007/s12665-021-09737-w ◽

2021 ◽

Vol 80 (12) ◽

Author(s):

Binbin Zhao ◽

Yunfeng Ge ◽

Hongzhi Chen

Keyword(s):

Random Forest ◽

Landslide Susceptibility ◽

Fractal Theory ◽

Hybrid Approach ◽

Gansu Province ◽

Information Value ◽

Susceptibility Assessment ◽

Landslide Susceptibility Assessment ◽

Forest Models ◽

Random Forest Models

Download Full-text

Random forest models of 305-days milk yield for Holstein cows in Bulgaria

10.1063/5.0034778 ◽

2020 ◽

Author(s):

A. Yordanova ◽

H. Kulina

Keyword(s):

Random Forest ◽

Milk Yield ◽

Holstein Cows ◽

Forest Models ◽

Random Forest Models

Download Full-text

Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2012040103 ◽

2012 ◽

Vol 8 (2) ◽

pp. 44-63 ◽

Cited By ~ 30

Author(s):

Baoxun Xu ◽

Joshua Zhexue Huang ◽

Graham Williams ◽

Qiang Wang ◽

Yunming Ye

Keyword(s):

Random Forest ◽

High Dimensional Data ◽

Real Life ◽

Classification Performance ◽

Feature Weighting ◽

Random Forest Model ◽

High Dimensional ◽

Forest Model ◽

Forest Models ◽

Random Forest Models

The selection of feature subspaces for growing decision trees is a key step in building random forest models. However, the common approach using randomly sampling a few features in the subspace is not suitable for high dimensional data consisting of thousands of features, because such data often contains many features which are uninformative to classification, and the random sampling often doesn’t include informative features in the selected subspaces. Consequently, classification performance of the random forest model is significantly affected. In this paper, the authors propose an improved random forest method which uses a novel feature weighting method for subspace selection and therefore enhances classification performance over high-dimensional data. A series of experiments on 9 real life high dimensional datasets demonstrated that using a subspace size of features where M is the total number of features in the dataset, our random forest model significantly outperforms existing random forest models.

Download Full-text

Gully erosion zonation mapping using integrated geographically weighted regression with certainty factor and random forest models in GIS

Journal of Environmental Management ◽

10.1016/j.jenvman.2018.11.110 ◽

2019 ◽

Vol 232 ◽

pp. 928-942 ◽

Cited By ~ 46

Author(s):

Alireza Arabameri ◽

Biswajeet Pradhan ◽

Khalil Rezaei

Keyword(s):

Random Forest ◽

Geographically Weighted Regression ◽

Gully Erosion ◽

Weighted Regression ◽

Certainty Factor ◽

Forest Models ◽

Random Forest Models

Download Full-text