Segmentation and Identification of Vertebrae in CT Scans Using CNN, k-Means Clustering and k-NN

Nicola Altini; Giuseppe De Giosa; Nicola Fragasso; Claudia Coscia; Elena Sibilano; Berardino Prencipe; Sardar Mehboob Hussain; Antonio Brunetti; Domenico Buongiorno; Andrea Guerriero; Ilaria Sabina Tatò; Gioacchino Brunetti; Vito Triggiani; Vitoantonio Bevilacqua

doi:10.3390/informatics8020040

Segmentation and Identification of Vertebrae in CT Scans Using CNN, k-Means Clustering and k-NN

Informatics ◽

10.3390/informatics8020040 ◽

2021 ◽

Vol 8 (2) ◽

pp. 40

Author(s):

Nicola Altini ◽

Giuseppe De Giosa ◽

Nicola Fragasso ◽

Claudia Coscia ◽

Elena Sibilano ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Machine Learning Algorithms ◽

Ct Scans ◽

Computer Assisted ◽

Dice Coefficient ◽

Labeling Algorithms ◽

Two Phases ◽

Vertebrae Segmentation ◽

Whole Spine

The accurate segmentation and identification of vertebrae presents the foundations for spine analysis including fractures, malfunctions and other visual insights. The large-scale vertebrae segmentation challenge (VerSe), organized as a competition at the Medical Image Computing and Computer Assisted Intervention (MICCAI), is aimed at vertebrae segmentation and labeling. In this paper, we propose a framework that addresses the tasks of vertebrae segmentation and identification by exploiting both deep learning and classical machine learning methodologies. The proposed solution comprises two phases: a binary fully automated segmentation of the whole spine, which exploits a 3D convolutional neural network, and a semi-automated procedure that allows locating vertebrae centroids using traditional machine learning algorithms. Unlike other approaches, the proposed method comes with the added advantage of no requirement for single vertebrae-level annotations to be trained. A dataset of 214 CT scans has been extracted from VerSe’20 challenge data, for training, validating and testing the proposed approach. In addition, to evaluate the robustness of the segmentation and labeling algorithms, 12 CT scans from subjects affected by severe, moderate and mild scoliosis have been collected from a local medical clinic. On the designated test set from Verse’20 data, the binary spine segmentation stage allowed to obtain a binary Dice coefficient of 89.17%, whilst the vertebrae identification one reached an average multi-class Dice coefficient of 90.09%. In order to ensure the reproducibility of the algorithms hereby developed, the code has been made publicly available.

Download Full-text

Automatic vertebrae localization and segmentation in CT with a two-stage Dense-U-Net

Scientific Reports ◽

10.1038/s41598-021-01296-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Pengfei Cheng ◽

Yusheng Yang ◽

Huiqiang Yu ◽

Yongyi He

Keyword(s):

Detection Rate ◽

Region Of Interest ◽

Computer Assisted Surgery ◽

Computer Assisted ◽

Location Error ◽

Dice Coefficient ◽

Two Stage ◽

Second Stage ◽

Vertebrae Segmentation ◽

Whole Spine

AbstractAutomatic vertebrae localization and segmentation in computed tomography (CT) are fundamental for spinal image analysis and spine surgery with computer-assisted surgery systems. But they remain challenging due to high variation in spinal anatomy among patients. In this paper, we proposed a deep-learning approach for automatic CT vertebrae localization and segmentation with a two-stage Dense-U-Net. The first stage used a 2D-Dense-U-Net to localize vertebrae by detecting the vertebrae centroids with dense labels and 2D slices. The second stage segmented the specific vertebra within a region-of-interest identified based on the centroid using 3D-Dense-U-Net. Finally, each segmented vertebra was merged into a complete spine and resampled to original resolution. We evaluated our method on the dataset from the CSI 2014 Workshop with 6 metrics: location error (1.69 ± 0.78 mm), detection rate (100%) for vertebrae localization; the dice coefficient (0.953 ± 0.014), intersection over union (0.911 ± 0.025), Hausdorff distance (4.013 ± 2.128 mm), pixel accuracy (0.998 ± 0.001) for vertebrae segmentation. The experimental results demonstrated the efficiency of the proposed method. Furthermore, evaluation on the dataset from the xVertSeg challenge with location error (4.12 ± 2.31), detection rate (100%), dice coefficient (0.877 ± 0.035) shows the generalizability of our method. In summary, our solution localized the vertebrae successfully by detecting the centroids of vertebrae and implemented instance segmentation of vertebrae in the whole spine.

Download Full-text

Efficient Image Retrieval approach for Large-scale Chest X Ray data using Hand-Crafted Features and Machine Learning Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i11.890896 ◽

2018 ◽

Vol 6 (11) ◽

pp. 890-896

Author(s):

Irene Getzi S ◽

D. Christopher Durairaj ◽

V Joseph Raj

Keyword(s):

Machine Learning ◽

Image Retrieval ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

X Ray ◽

Chest X Ray

Download Full-text

Clinician checklist for assessing suitability of machine learning applications in healthcare

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100251 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100251

Author(s):

Ian Scott ◽

Stacey Carter ◽

Enrico Coiera

Keyword(s):

Machine Learning ◽

Large Scale ◽

Clinical Decision Making ◽

Improve Patient Care ◽

Clinical Decision ◽

Routine Care ◽

Machine Learning Algorithms ◽

Clinical Settings ◽

Machine Learning Applications ◽

Key Issues

Machine learning algorithms are being used to screen and diagnose disease, prognosticate and predict therapeutic responses. Hundreds of new algorithms are being developed, but whether they improve clinical decision making and patient outcomes remains uncertain. If clinicians are to use algorithms, they need to be reassured that key issues relating to their validity, utility, feasibility, safety and ethical use have been addressed. We propose a checklist of 10 questions that clinicians can ask of those advocating for the use of a particular algorithm, but which do not expect clinicians, as non-experts, to demonstrate mastery over what can be highly complex statistical and computational concepts. The questions are: (1) What is the purpose and context of the algorithm? (2) How good were the data used to train the algorithm? (3) Were there sufficient data to train the algorithm? (4) How well does the algorithm perform? (5) Is the algorithm transferable to new clinical settings? (6) Are the outputs of the algorithm clinically intelligible? (7) How will this algorithm fit into and complement current workflows? (8) Has use of the algorithm been shown to improve patient care and outcomes? (9) Could the algorithm cause patient harm? and (10) Does use of the algorithm raise ethical, legal or social concerns? We provide examples where an algorithm may raise concerns and apply the checklist to a recent review of diagnostic imaging applications. This checklist aims to assist clinicians in assessing algorithm readiness for routine care and identify situations where further refinement and evaluation is required prior to large-scale use.

Download Full-text

57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0057 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A62-A62

Author(s):

Dattatreya Mellacheruvu ◽

Rachel Pyke ◽

Charles Abbott ◽

Nick Phillips ◽

Sejal Desai ◽

...

Keyword(s):

Machine Learning ◽

Cell Lines ◽

Antigen Processing ◽

Large Scale ◽

Prediction Models ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Training Data ◽

High Quality ◽

Tissue Samples

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

Download Full-text

Essentiality of Machine Learning Algorithms for Big Data Computation

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch011 ◽

2016 ◽

pp. 156-167

Author(s):

Manjunath Thimmasandra Narayanapppa ◽

T. P. Puneeth Kumar ◽

Ravindra S. Hegadi

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Learning Algorithms ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Real Time Analysis ◽

Large Scale Data ◽

Computational Environment ◽

Large Scale Data Processing

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.

Download Full-text

Large-Scale Machine Learning Algorithms for Biomedical Data Science

Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics - BCB '19 ◽

10.1145/3307339.3342130 ◽

2019 ◽

Author(s):

Heng Huang

Keyword(s):

Machine Learning ◽

Large Scale ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Biomedical Data

Download Full-text

Big Data’s Role in Health and Risk Messaging

Oxford Research Encyclopedia of Communication ◽

10.1093/acrefore/9780190228613.013.359 ◽

2017 ◽

Author(s):

Bradford William Hesse

Keyword(s):

Machine Learning ◽

Big Data ◽

Risk Communication ◽

Large Scale ◽

Protein Identification ◽

Machine Learning Algorithms ◽

National Committee ◽

Learning Approaches ◽

Road Map ◽

Data Flows

The presence of large-scale data systems can be felt, consciously or not, in almost every facet of modern life, whether through the simple act of selecting travel options online, purchasing products from online retailers, or navigating through the streets of an unfamiliar neighborhood using global positioning system (GPS) mapping. These systems operate through the momentum of big data, a term introduced by data scientists to describe a data-rich environment enabled by a superconvergence of advanced computer-processing speeds and storage capacities; advanced connectivity between people and devices through the Internet; the ubiquity of smart, mobile devices and wireless sensors; and the creation of accelerated data flows among systems in the global economy. Some researchers have suggested that big data represents the so-called fourth paradigm in science, wherein the first paradigm was marked by the evolution of the experimental method, the second was brought about by the maturation of theory, the third was marked by an evolution of statistical methodology as enabled by computational technology, while the fourth extended the benefits of the first three, but also enabled the application of novel machine-learning approaches to an evidence stream that exists in high volume, high velocity, high variety, and differing levels of veracity. In public health and medicine, the emergence of big data capabilities has followed naturally from the expansion of data streams from genome sequencing, protein identification, environmental surveillance, and passive patient sensing. In 2001, the National Committee on Vital and Health Statistics published a road map for connecting these evidence streams to each other through a national health information infrastructure. Since then, the road map has spurred national investments in electronic health records (EHRs) and motivated the integration of public surveillance data into analytic platforms for health situational awareness. More recently, the boom in consumer-oriented mobile applications and wireless medical sensing devices has opened up the possibility for mining new data flows directly from altruistic patients. In the broader public communication sphere, the ability to mine the digital traces of conversation on social media presents an opportunity to apply advanced machine learning algorithms as a way of tracking the diffusion of risk communication messages. In addition to utilizing big data for improving the scientific knowledge base in risk communication, there will be a need for health communication scientists and practitioners to work as part of interdisciplinary teams to improve the interfaces to these data for professionals and the public. Too much data, presented in disorganized ways, can lead to what some have referred to as “data smog.” Much work will be needed for understanding how to turn big data into knowledge, and just as important, how to turn data-informed knowledge into action.

Download Full-text

Automatic Pulmonary Nodule Detection Applying Deep Learning or Machine Learning Algorithms to the LIDC-IDRI Database: A Systematic Review

Diagnostics ◽

10.3390/diagnostics9010029 ◽

2019 ◽

Vol 9 (1) ◽

pp. 29 ◽

Cited By ~ 20

Author(s):

Lea Pehrson ◽

Michael Nielsen ◽

Carsten Ammitzbøl Lauridsen

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Deep Learning ◽

Machine Learning Algorithms ◽

Ct Scans ◽

Lung Nodules ◽

Original Research ◽

Feature Based ◽

High Level ◽

Meta Analyses

The aim of this study was to provide an overview of the literature available on machine learning (ML) algorithms applied to the Lung Image Database Consortium Image Collection (LIDC-IDRI) database as a tool for the optimization of detecting lung nodules in thoracic CT scans. This systematic review was compiled according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Only original research articles concerning algorithms applied to the LIDC-IDRI database were included. The initial search yielded 1972 publications after removing duplicates, and 41 of these articles were included in this study. The articles were divided into two subcategories describing their overall architecture. The majority of feature-based algorithms achieved an accuracy >90% compared to the deep learning (DL) algorithms that achieved an accuracy in the range of 82.2%–97.6%. In conclusion, ML and DL algorithms are able to detect lung nodules with a high level of accuracy, sensitivity, and specificity using ML, when applied to an annotated archive of CT scans of the lung. However, there is no consensus on the method applied to determine the efficiency of ML algorithms.

Download Full-text

Erratum to: Combining semi-automated image analysis techniques with machine learning algorithms to accelerate large-scale genetic studies

GigaScience ◽

10.1093/gigascience/giy043 ◽

2018 ◽

Vol 7 (7) ◽

Author(s):

Jonathan A Atkinson ◽

Guillaume Lobet ◽

Manuel Noll ◽

Patrick E Meyer ◽

Marcus Griffiths ◽

...

Keyword(s):

Machine Learning ◽

Image Analysis ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Automated Image Analysis ◽

Genetic Studies ◽

Analysis Techniques ◽

Image Analysis Techniques

Download Full-text

Neural Network Potentials: A Concise Overview of Methods

Annual Review of Physical Chemistry ◽

10.1146/annurev-physchem-082720-034254 ◽

2022 ◽

Vol 73 (1) ◽

Author(s):

Emir Kocer ◽

Tsz Wai Ko ◽

Jörg Behler

Keyword(s):

Machine Learning ◽

Atomic Structure ◽

Electrostatic Interactions ◽

Large Scale ◽

Machine Learning Algorithms ◽

Annual Review ◽

Publication Date ◽

Great Success ◽

Concise Overview ◽

Wide Range

In the past two decades, machine learning potentials (MLPs) have reached a level of maturity that now enables applications to large-scale atomistic simulations of a wide range of systems in chemistry, physics, and materials science. Different machine learning algorithms have been used with great success in the construction of these MLPs. In this review, we discuss an important group of MLPs relying on artificial neural networks to establish a mapping from the atomic structure to the potential energy. In spite of this common feature, there are important conceptual differences among MLPs, which concern the dimensionality of the systems, the inclusion of long-range electrostatic interactions, global phenomena like nonlocal charge transfer, and the type of descriptor used to represent the atomic structure, which can be either predefined or learnable. A concise overview is given along with a discussion of the open challenges in the field. Expected final online publication date for the Annual Review of Physical Chemistry, Volume 73 is April 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text