Scalable storage of whole slide images and fast retrieval of tiles using Apache Spark

Pathology image-based lung cancer subtyping using deeplearning features and cell-density maps

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-064 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 64-1-64-5

Author(s):

Mustafa I. Jaber ◽

Christopher W. Szeto ◽

Bing Song ◽

Liudmila Beziaeva ◽

Stephen C. Benz ◽

...

Keyword(s):

Lung Cancer ◽

Cell Density ◽

Majority Voting ◽

Training Set ◽

Density Maps ◽

Color Deconvolution ◽

Map Generation ◽

Density Map ◽

Pathology Image ◽

Whole Slide Images

In this paper, we propose a patch-based system to classify non-small cell lung cancer (NSCLC) diagnostic whole slide images (WSIs) into two major histopathological subtypes: adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC). Classifying patients accurately is important for prognosis and therapy decisions. The proposed system was trained and tested on 876 subtyped NSCLC gigapixel-resolution diagnostic WSIs from 805 patients – 664 in the training set and 141 in the test set. The algorithm has modules for: 1) auto-generated tumor/non-tumor masking using a trained residual neural network (ResNet34), 2) cell-density map generation (based on color deconvolution, local drain segmentation, and watershed transformation), 3) patch-level feature extraction using a pre-trained ResNet34, 4) a tower of linear SVMs for different cell ranges, and 5) a majority voting module for aggregating subtype predictions in unseen testing WSIs. The proposed system was trained and tested on several WSI magnifications ranging from x4 to x40 with a best ROC AUC of 0.95 and an accuracy of 0.86 in test samples. This fully-automated histopathology subtyping method outperforms similar published state-of-the-art methods for diagnostic WSIs.

Download Full-text

Analysis of Retail Data using Apache Spark

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i5.11621165 ◽

2019 ◽

Vol 7 (5) ◽

pp. 1162-1165

Author(s):

Himani Agnihotri ◽

Bharti Nagpal

Keyword(s):

Apache Spark

Download Full-text

DISTRIBUTED PROCESSING OF LARGE VOLUMES OF TRANSACTIONAL DATA

Naukovyi visnyk Donetskoho natsionalnoho tekhnichnoho universytetu ◽

10.31474/2415-7902-2020-1(4)-2(5)-27-36 ◽

2020 ◽

pp. 27-36

Author(s):

O. Dmytriieva ◽

◽

D. Nikulin

Keyword(s):

Distributed Processing ◽

Apache Spark ◽

Hadoop Mapreduce ◽

Transactional Data

Роботу присвячено питанням розподіленої обробки транзакцій при проведенні аналізу великих обсягів даних з метою пошуку асоціативних правил. На основі відомих алгоритмів глибинного аналізу даних для пошуку частих предметних наборів AIS та Apriori було визначено можливі варіанти паралелізації, які позбавлені необхідності ітераційного сканування бази даних та великого споживання пам'яті. Досліджено можливість перенесення обчислень на різні платформи, які підтримують паралельну обробку даних. В якості обчислювальних платформ було обрано MapReduce – потужну базу для обробки великих, розподілених наборів даних на кластері Hadoop, а також програмний інструмент для обробки надзвичайно великої кількості даних Apache Spark. Проведено порівняльний аналіз швидкодії розглянутих методів, отримано рекомендації щодо ефективного використання паралельних обчислювальних платформ, запропоновано модифікації алгоритмів пошуку асоціативних правил. В якості основних завдань, реалізованих в роботі, слід визначити дослідження сучасних засобів розподіленої обробки структурованих і не структурованих даних, розгортання тестового кластера в хмарному сервісі, розробку скриптів для автоматизації розгортання кластера, проведення модифікацій розподілених алгоритмів з метою адаптації під необхідні фреймворки розподілених обчислень, отримання показників швидкодії обробки даних в послідовному і розподіленому режимах з застосуванням Hadoop MapReduce. та Apache Spark, проведення порівняльного аналізу результатів тестових вимірів швидкодії, отримання та обґрунтування залежності між кількістю оброблюваних даних, і часом, витраченим на обробку, оптимізацію розподілених алгоритмів пошуку асоціативних правил при обробці великих обсягів транзакційних даних, отримання показників швидкодії розподіленої обробки існуючими програмними засобами. Ключові слова: розподілена обробка, транзакційні дані, асоціативні правила, обчислюваний кластер, Hadoop, MapReduce, Apache Spark

Download Full-text

Increase the Performance of K-Means Clustering Algorithm Using Apache Spark

The International Journal of Internet of Things and its Applications ◽

10.21742/ijiota.2017.1.1.02 ◽

2017 ◽

Vol 1 (1) ◽

pp. 13-28 ◽

Cited By ~ 1

Author(s):

Chang Xie ◽

Keyword(s):

Clustering Algorithm ◽

Apache Spark

Download Full-text

Exploring Apache Spark Data APIs for Water Big Data Management

Advances in Intelligent Systems and Computing - Advanced Intelligent Systems for Sustainable Development (AI2SD’2018) ◽

10.1007/978-3-030-11881-5_10 ◽

2019 ◽

pp. 105-117

Author(s):

Nassif El Hassane ◽

Hicham Hajji

Keyword(s):

Big Data ◽

Data Management ◽

Apache Spark

Download Full-text

A Digital Pathology Solution to Resolve the Tissue Floater Conundrum

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2020-0034-oa ◽

2020 ◽

Author(s):

Liron Pantanowitz ◽

Pamela Michelow ◽

Scott Hazelhurst ◽

Shivam Kalra ◽

Charles Choi ◽

...

Keyword(s):

Digital Pathology ◽

Primary Diagnosis ◽

Image Search ◽

Molecular Tests ◽

Tissue Identification ◽

Digital Database ◽

Glass Slides ◽

Search Tool ◽

Hematoxylin And Eosin ◽

Whole Slide Images

Context.— Pathologists may encounter extraneous pieces of tissue (tissue floaters) on glass slides because of specimen cross-contamination. Troubleshooting this problem, including performing molecular tests for tissue identification if available, is time consuming and often does not satisfactorily resolve the problem. Objective.— To demonstrate the feasibility of using an image search tool to resolve the tissue floater conundrum. Design.— A glass slide was produced containing 2 separate hematoxylin and eosin (H&E)-stained tissue floaters. This fabricated slide was digitized along with the 2 slides containing the original tumors used to create these floaters. These slides were then embedded into a dataset of 2325 whole slide images comprising a wide variety of H&E stained diagnostic entities. Digital slides were broken up into patches and the patch features converted into barcodes for indexing and easy retrieval. A deep learning-based image search tool was employed to extract features from patches via barcodes, hence enabling image matching to each tissue floater. Results.— There was a very high likelihood of finding a correct tumor match for the queried tissue floater when searching the digital database. Search results repeatedly yielded a correct match within the top 3 retrieved images. The retrieval accuracy improved when greater proportions of the floater were selected. The time to run a search was completed within several milliseconds. Conclusions.— Using an image search tool offers pathologists an additional method to rapidly resolve the tissue floater conundrum, especially for those laboratories that have transitioned to going fully digital for primary diagnosis.

Download Full-text