SelectStitch: Automated Frame Segmentation and Stitching to Create Composite Images from Otoscope Video Clips

Hamidullah Binol; Aaron C. Moberly; Muhammad Khalid Khan Niazi; Garth Essig; Jay Shah; Charles Elmaraghy; Theodoros Teknos; Nazhat Taj-Schaal; Lianbo Yu; Metin N. Gurcan

doi:10.3390/app10175894

SelectStitch: Automated Frame Segmentation and Stitching to Create Composite Images from Otoscope Video Clips

Applied Sciences ◽

10.3390/app10175894 ◽

2020 ◽

Vol 10 (17) ◽

pp. 5894

Author(s):

Hamidullah Binol ◽

Aaron C. Moberly ◽

Muhammad Khalid Khan Niazi ◽

Garth Essig ◽

Jay Shah ◽

...

Keyword(s):

Semantic Segmentation ◽

Video Data ◽

Video Frame ◽

Frame Size ◽

Composite Image ◽

Frame Selection ◽

Segmentation Approach ◽

Base Composite ◽

Diagnostic Capabilities ◽

Repeat Assessment

Background and Objective: the aim of this study is to develop and validate an automated image segmentation-based frame selection and stitching framework to create enhanced composite images from otoscope videos. The proposed framework, called SelectStitch, is useful for classifying eardrum abnormalities using a single composite image instead of the entire raw otoscope video dataset. Methods: SelectStitch consists of a convolutional neural network (CNN) based semantic segmentation approach to detect the eardrum in each frame of the otoscope video, and a stitching engine to generate a high-quality composite image from the detected eardrum regions. In this study, we utilize two separate datasets: the first one has 36 otoscope videos that were used to train a semantic segmentation model, and the second one, containing 100 videos, which was used to test the proposed method. Cases from both adult and pediatric patients were used in this study. A configuration of 4-levels depth U-Net architecture was trained to automatically find eardrum regions in each otoscope video frame from the first dataset. After the segmentation, we automatically selected meaningful frames from otoscope videos by using a pre-defined threshold, i.e., it should contain at least an eardrum region of 20% of a frame size. We have generated 100 composite images from the test dataset. Three ear, nose, and throat (ENT) specialists (ENT-I, ENT-II, ENT-III) compared in two rounds the composite images produced by SelectStitch against the composite images that were generated by the base processes, i.e., stitching all the frames from the same video data, in terms of their diagnostic capabilities. Results: In the first round of the study, ENT-I, ENT-II, ENT-III graded improvement for 58, 57, and 71 composite images out of 100, respectively, for SelectStitch over the base composite, reflecting greater diagnostic capabilities. In the repeat assessment, these numbers were 56, 56, and 64, respectively. We observed that only 6%, 3%, and 3% of the cases received a lesser score than the base composite images, respectively, for ENT-I, ENT-II, and ENT-III in Round-1, and 4%, 0%, and 2% of the cases in Round-2. Conclusions: We conclude that the frame selection and stitching will increase the probability of detecting a lesion even if it appears in a few frames.

Download Full-text

SelectStitch: Automated Frame Segmentation and Stitching to Create Composite Images from Otoscope Video Clips

10.1101/2020.08.12.20173765 ◽

2020 ◽

Author(s):

Hamidullah Binol ◽

Aaron C Moberly ◽

M. Khalid Khan Niazi ◽

Garth Essig ◽

Jay Shah ◽

...

Keyword(s):

Semantic Segmentation ◽

Video Data ◽

Video Frame ◽

Frame Size ◽

Composite Image ◽

Frame Selection ◽

Diagnostic Quality ◽

Video Clips ◽

Base Composite ◽

Diagnostic Capabilities

Background and Objective: The aim of this study is to develop and validate an automated image segmentation-based frame selection and stitching framework to create enhanced composite images from otoscope videos. The proposed framework, called SelectStitch, is useful for classifying eardrum abnormalities using a single composite image instead of the entire raw otoscope video dataset. Methods: SelectStitch consists of a convolutional neural network (CNN) based semantic segmentation approach to detect the eardrum in each frame of the otoscope video, and a stitching engine to generate a high-quality composite image from the detected eardrum regions. In this study, we utilize two separate datasets: the first one has 36 otoscope videos that were used to train a semantic segmentation model, and the second one, containing 100 videos, which was used to test the proposed method. Cases from both adult and pediatric patients were used in this study. A configuration of 4-levels depth U-Net architecture was trained to automatically find eardrum regions in each otoscope video frame from the first dataset. After the segmentation, we automatically selected meaningful frames from otoscope videos by using a pre-defined threshold, i.e., it should contain at least an eardrum region of 20% of a frame size. We have generated 100 composite images from the test dataset. Three ear, nose, and throat (ENT) specialists (ENT-I, ENT-II, ENT-III) compared in two rounds the composite images produced by SelectStitch against the composite images that were generated by the base processes, i.e., stitching all the frames from the same video data, in terms of their diagnostic capabilities. Results: In the first round of the study, ENT-I, ENT-II, ENT-III graded improvement for 58, 57, and 71 composite images out of 100, respectively, for SelectStitch over the base composite, reflecting greater diagnostic capabilities. In the repeat assessment, these numbers were 56, 56, and 64, respectively. We observed that only 6%, 3%, and 3% of the cases received a lesser score than the base composite images, respectively, for ENT-I, ENT-II, and ENT-III in Round-1, and 4%, 0%, and 2% of the cases in Round-2. Conclusions: Frame selection improves the diagnostic quality of composite images from otoscope video clips.

Download Full-text

VIDEO COPY DETECTION UTILIZING THE LOG-POLAR TRANSFORMATION

International Journal of Computing ◽

10.47839/ijc.15.1.825 ◽

2016 ◽

pp. 8-13

Author(s):

Daniel Reynolds ◽

Richard A. Messner

Keyword(s):

Data Reduction ◽

Reduction Process ◽

Video Data ◽

Video Frame ◽

Detection Accuracy ◽

Video Copy Detection ◽

Copy Detection ◽

Frame Size ◽

Detection Process ◽

Processing Step

Video copy detection is the process of comparing and analyzing videos to extract a measure of their similarity in order to determine if they are copies, modified versions, or completely different videos. With video frame sizes increasing rapidly, it is important to allow for a data reduction process to take place in order to achieve fast video comparisons. Further, detecting video streaming and storage of legal and illegal video data necessitates the fast and efficient implementation of video copy detection algorithms. In this paper some commonly used algorithms for video copy detection are implemented with the Log-Polar transformation being used as a pre-processing step to reduce the frame size prior to signature calculation. Two global based algorithms were chosen to validate the use of Log-Polar as an acceptable data reduction stage. The results of this research demonstrate that the addition of this pre-processing step significantly reduces the computation time of the overall video copy detection process while not significantly affecting the detection accuracy of the algorithm used for the detection process.

Download Full-text

Statistical Analysis of Video Frame Size Distribution Originating from Scalable Video Codec (SVC)

Complexity ◽

10.1155/2017/8098574 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12

Author(s):

Sima Ahmadpour ◽

Tat-Chee Wan ◽

Zohreh Toghrayee ◽

Fariba HematiGazafi

Keyword(s):

Video Compression ◽

High Performance ◽

Hypothesis Test ◽

Video Data ◽

Video Codec ◽

Video Frame ◽

Scalable Video ◽

Compression Technique ◽

Data Set ◽

Frame Size

Designing an effective and high performance network requires an accurate characterization and modeling of network traffic. The modeling of video frame sizes is normally applied in simulation studies and mathematical analysis and generating streams for testing and compliance purposes. Besides, video traffic assumed as a major source of multimedia traffic in future heterogeneous network. Therefore, the statistical distribution of video data can be used as the inputs for performance modeling of networks. The finding of this paper comprises the theoretical definition of distribution which seems to be relevant to the video trace in terms of its statistical properties and finds the best distribution using both the graphical method and the hypothesis test. The data set used in this article consists of layered video traces generating from Scalable Video Codec (SVC) video compression technique of three different movies.

Download Full-text

A Semantic Segmentation Approach Based on DeepLab Network in High-Resolution Remote Sensing Images

Lecture Notes in Computer Science - Image and Graphics ◽

10.1007/978-3-030-34113-8_25 ◽

2019 ◽

pp. 292-304

Author(s):

Hangtao Hu ◽

Shuo Cai ◽

Wei Wang ◽

Peng Zhang ◽

Zhiyong Li

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Remote Sensing Images ◽

Segmentation Approach

Download Full-text

Spinal cord segmentation and classification of degenerative disease

International Journal of Research in Pharmaceutical Sciences ◽

10.26452/ijrps.v10i3.1490 ◽

2019 ◽

Vol 10 (3) ◽

pp. 2426-2432 ◽

Cited By ~ 1

Author(s):

Arjun ◽

Kanchana V

Keyword(s):

Image Processing ◽

Spinal Cord ◽

Human Life ◽

Semantic Segmentation ◽

Processing Technique ◽

Degenerative Disease ◽

Image Processing Technique ◽

Segmentation Approach ◽

Work First ◽

Mri Image

spinal cord plays an important role in human life. In our work, we are using digital image processing technique, the interior part of the human body can be analyzed using MRI, CT and X-RAY etc. Medical image processing technique is extensively used in medical field. In here we are using MRI image to perform our work In our proposed work, we are finding degenerative disease from spinal cord image. In our work first, we are preprocessing the MRI image and locate the degenerative part of the spinal cord, finding the degenerative part using various segmentation approach after that classifying degenerative disease or normal spinal cord using various classification algorithm. For segmentation, we are using an efficient semantic segmentation approach

Download Full-text

Semantic segmentation approach for tunnel roads’ analysis

Applications of Digital Image Processing XLII ◽

10.1117/12.2529508 ◽

2019 ◽

Author(s):

Arcadi Llanza ◽

Assan Sanogo ◽

Marouan Khata ◽

Alami Khalil ◽

Nadiya Shvai ◽

...

Keyword(s):

Semantic Segmentation ◽

Segmentation Approach

Download Full-text

FaceParser – A new face segmentation approach and labeleddatabase

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.5.10043 ◽

2018 ◽

Vol 7 (2.5) ◽

pp. 1

Author(s):

Khalil Khan ◽

Nasir Ahmad ◽

Irfan Uddin ◽

Muhammad Ehsan Mazhar ◽

Rehan Ullah Khan

Keyword(s):

Semantic Segmentation ◽

New Method ◽

Facial Image ◽

Data Set ◽

Semantic Classes ◽

Discriminative Model ◽

Face Segmentation ◽

Segmentation Approach ◽

Class Labels ◽

Pixel Labeling

Background and objective: A novel face parsing method is proposed in this paper which partition facial image into six semantic classes. Unlike previous approaches which segmented a facial image into three or four classes, we extended the class labels to six. Materials and Methods: A data-set of 464 images taken from FEI, MIT-CBCL, Pointing’04 and SiblingsDB databases was annotated. A discriminative model was trained by extracting features from squared patches. The built model was tested on two different semantic segmentation approaches – pixel-based and super-pixel-based semantic segmentation (PB_SS and SPB_SS).Results: A pixel labeling accuracy (PLA) of 94.68% and 90.35% was obtained with PB_SS and SPB_SS methods respectively on frontal images. Conclusions: A new method for face parts parsing was proposed which efficiently segmented a facial image into its constitute parts.

Download Full-text

A Semantic Segmentation Approach to Recognize Assault Rifles in ISIS Propaganda Images

2020 IEEE 14th International Conference on Semantic Computing (ICSC) ◽

10.1109/icsc.2020.00082 ◽

2020 ◽

Author(s):

Bhashithe Abeysinghe ◽

Yannick Veilleux-Lepage ◽

Mia M. Bloom ◽

Rajshekhar Sunderraman

Keyword(s):

Semantic Segmentation ◽

Segmentation Approach

Download Full-text

A SEMANTIC 3D POINT CLOUD SEGMENTATION APPROACH BASED ON OPTIMAL VIEW SELECTION FOR 2D IMAGE FEATURE EXTRACTION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w17-9-2019 ◽

2019 ◽

Vol XLII-2/W17 ◽

pp. 9-14

Author(s):

A. Adam ◽

L. Grammatikopoulos ◽

G. Karras ◽

E. Protopapadakis ◽

K. Karantzalos

Keyword(s):

Feature Extraction ◽

Point Cloud ◽

Structural Information ◽

Semantic Segmentation ◽

Point Clouds ◽

Image Feature ◽

Hybrid Features ◽

Semantic Class ◽

Segmentation Approach ◽

Traditional Approaches

Abstract. 3D semantic segmentation is the joint task of partitioning a point cloud into semantically consistent 3D regions and assigning them to a semantic class/label. While the traditional approaches for 3D semantic segmentation typically rely only on structural information of the objects (i.e. object geometry and shape), the last years many techniques combining both visual and geometric features have emerged, taking advantage of the progress in SfM/MVS algorithms that reconstruct point clouds from multiple overlapping images. Our work describes a hybrid methodology for 3D semantic segmentation, relying both on 2D and 3D space and aiming at exploring whether image selection is critical as regards the accuracy of 3D semantic segmentation of point clouds. Experimental results are demonstrated on a free online dataset depicting city blocks around Paris. The experimental procedure not only validates that hybrid features (geometric and visual) can achieve a more accurate semantic segmentation, but also demonstrates the importance of the most appropriate view for the 2D feature extraction.

Download Full-text

Audio and Video Decoding and Synchronous Playback for Embedded Systems

Journal of Computer Science Research ◽

10.30564/jcsr.v1i1.165 ◽

2018 ◽

Vol 1 (1) ◽

Author(s):

Zhitao Li

Keyword(s):

Embedded System ◽

Video Data ◽

Video Frame ◽

Test Results ◽

Video Playback ◽

Video Decoding ◽

Arm Processor ◽

Audio Data ◽

Hardware Processing ◽

Arm Embedded System

The audio and video decoding and synchronization playback system ofMPEG-2 TS stream is designed and implemented based on ARM embedded system. In this system, hardware processor is embedded in the ARM processor. In order to make full use of this resource, hardware MFC is adopted. The multi-format codec decoder decodes the video data and decodes the audio data using the open source Mad (libmad) library. The V4L2 (Video for Linux2) driver interface and the ALSA (advanced Linux sound architecture) library are used to implement the video. Because the video frame playback period and the hardware processing delay are inconsistent, the system has a time difference between the audio and video data operations, which causes the audio and video playback to be out of sync. Therefore, we use the method of synchronizing the video playback implemented to the audio playback stream; realize the audio and video are playing sync. Test results show that, the designed audio decodes and synchronization playback system can decode and synchronize audio and video data.

Download Full-text