scholarly journals SelectStitch: Automated Frame Segmentation and Stitching to Create Composite Images from Otoscope Video Clips

2020 ◽  
Vol 10 (17) ◽  
pp. 5894
Author(s):  
Hamidullah Binol ◽  
Aaron C. Moberly ◽  
Muhammad Khalid Khan Niazi ◽  
Garth Essig ◽  
Jay Shah ◽  
...  

Background and Objective: the aim of this study is to develop and validate an automated image segmentation-based frame selection and stitching framework to create enhanced composite images from otoscope videos. The proposed framework, called SelectStitch, is useful for classifying eardrum abnormalities using a single composite image instead of the entire raw otoscope video dataset. Methods: SelectStitch consists of a convolutional neural network (CNN) based semantic segmentation approach to detect the eardrum in each frame of the otoscope video, and a stitching engine to generate a high-quality composite image from the detected eardrum regions. In this study, we utilize two separate datasets: the first one has 36 otoscope videos that were used to train a semantic segmentation model, and the second one, containing 100 videos, which was used to test the proposed method. Cases from both adult and pediatric patients were used in this study. A configuration of 4-levels depth U-Net architecture was trained to automatically find eardrum regions in each otoscope video frame from the first dataset. After the segmentation, we automatically selected meaningful frames from otoscope videos by using a pre-defined threshold, i.e., it should contain at least an eardrum region of 20% of a frame size. We have generated 100 composite images from the test dataset. Three ear, nose, and throat (ENT) specialists (ENT-I, ENT-II, ENT-III) compared in two rounds the composite images produced by SelectStitch against the composite images that were generated by the base processes, i.e., stitching all the frames from the same video data, in terms of their diagnostic capabilities. Results: In the first round of the study, ENT-I, ENT-II, ENT-III graded improvement for 58, 57, and 71 composite images out of 100, respectively, for SelectStitch over the base composite, reflecting greater diagnostic capabilities. In the repeat assessment, these numbers were 56, 56, and 64, respectively. We observed that only 6%, 3%, and 3% of the cases received a lesser score than the base composite images, respectively, for ENT-I, ENT-II, and ENT-III in Round-1, and 4%, 0%, and 2% of the cases in Round-2. Conclusions: We conclude that the frame selection and stitching will increase the probability of detecting a lesion even if it appears in a few frames.

2020 ◽  
Author(s):  
Hamidullah Binol ◽  
Aaron C Moberly ◽  
M. Khalid Khan Niazi ◽  
Garth Essig ◽  
Jay Shah ◽  
...  

Background and Objective: The aim of this study is to develop and validate an automated image segmentation-based frame selection and stitching framework to create enhanced composite images from otoscope videos. The proposed framework, called SelectStitch, is useful for classifying eardrum abnormalities using a single composite image instead of the entire raw otoscope video dataset. Methods: SelectStitch consists of a convolutional neural network (CNN) based semantic segmentation approach to detect the eardrum in each frame of the otoscope video, and a stitching engine to generate a high-quality composite image from the detected eardrum regions. In this study, we utilize two separate datasets: the first one has 36 otoscope videos that were used to train a semantic segmentation model, and the second one, containing 100 videos, which was used to test the proposed method. Cases from both adult and pediatric patients were used in this study. A configuration of 4-levels depth U-Net architecture was trained to automatically find eardrum regions in each otoscope video frame from the first dataset. After the segmentation, we automatically selected meaningful frames from otoscope videos by using a pre-defined threshold, i.e., it should contain at least an eardrum region of 20% of a frame size. We have generated 100 composite images from the test dataset. Three ear, nose, and throat (ENT) specialists (ENT-I, ENT-II, ENT-III) compared in two rounds the composite images produced by SelectStitch against the composite images that were generated by the base processes, i.e., stitching all the frames from the same video data, in terms of their diagnostic capabilities. Results: In the first round of the study, ENT-I, ENT-II, ENT-III graded improvement for 58, 57, and 71 composite images out of 100, respectively, for SelectStitch over the base composite, reflecting greater diagnostic capabilities. In the repeat assessment, these numbers were 56, 56, and 64, respectively. We observed that only 6%, 3%, and 3% of the cases received a lesser score than the base composite images, respectively, for ENT-I, ENT-II, and ENT-III in Round-1, and 4%, 0%, and 2% of the cases in Round-2. Conclusions: Frame selection improves the diagnostic quality of composite images from otoscope video clips.


2016 ◽  
pp. 8-13
Author(s):  
Daniel Reynolds ◽  
Richard A. Messner

Video copy detection is the process of comparing and analyzing videos to extract a measure of their similarity in order to determine if they are copies, modified versions, or completely different videos. With video frame sizes increasing rapidly, it is important to allow for a data reduction process to take place in order to achieve fast video comparisons. Further, detecting video streaming and storage of legal and illegal video data necessitates the fast and efficient implementation of video copy detection algorithms. In this paper some commonly used algorithms for video copy detection are implemented with the Log-Polar transformation being used as a pre-processing step to reduce the frame size prior to signature calculation. Two global based algorithms were chosen to validate the use of Log-Polar as an acceptable data reduction stage. The results of this research demonstrate that the addition of this pre-processing step significantly reduces the computation time of the overall video copy detection process while not significantly affecting the detection accuracy of the algorithm used for the detection process.


Complexity ◽  
2017 ◽  
Vol 2017 ◽  
pp. 1-12
Author(s):  
Sima Ahmadpour ◽  
Tat-Chee Wan ◽  
Zohreh Toghrayee ◽  
Fariba HematiGazafi

Designing an effective and high performance network requires an accurate characterization and modeling of network traffic. The modeling of video frame sizes is normally applied in simulation studies and mathematical analysis and generating streams for testing and compliance purposes. Besides, video traffic assumed as a major source of multimedia traffic in future heterogeneous network. Therefore, the statistical distribution of video data can be used as the inputs for performance modeling of networks. The finding of this paper comprises the theoretical definition of distribution which seems to be relevant to the video trace in terms of its statistical properties and finds the best distribution using both the graphical method and the hypothesis test. The data set used in this article consists of layered video traces generating from Scalable Video Codec (SVC) video compression technique of three different movies.


2019 ◽  
Vol 10 (3) ◽  
pp. 2426-2432 ◽  
Author(s):  
Arjun ◽  
Kanchana V

spinal cord plays an important role in human life. In our work, we are using digital image processing technique, the interior part of the human body can be analyzed using MRI, CT and X-RAY etc. Medical image processing technique is extensively used in medical field. In here we are using MRI image to perform our work In our proposed work, we are finding degenerative disease from spinal cord image. In our work first, we are preprocessing the MRI image and locate the degenerative part of the spinal cord, finding the degenerative part using various segmentation approach after that classifying degenerative disease or normal spinal cord using various classification algorithm. For segmentation, we are using an efficient semantic segmentation approach


Author(s):  
Arcadi Llanza ◽  
Assan Sanogo ◽  
Marouan Khata ◽  
Alami Khalil ◽  
Nadiya Shvai ◽  
...  

2018 ◽  
Vol 7 (2.5) ◽  
pp. 1
Author(s):  
Khalil Khan ◽  
Nasir Ahmad ◽  
Irfan Uddin ◽  
Muhammad Ehsan Mazhar ◽  
Rehan Ullah Khan

Background and objective: A novel face parsing method is proposed in this paper which partition facial image into six semantic classes. Unlike previous approaches which segmented a facial image into three or four classes, we extended the class labels to six. Materials and Methods: A data-set of 464 images taken from FEI, MIT-CBCL, Pointing’04 and SiblingsDB databases was annotated. A discriminative model was trained by extracting features from squared patches. The built model was tested on two different semantic segmentation approaches – pixel-based and super-pixel-based semantic segmentation (PB_SS and SPB_SS).Results: A pixel labeling accuracy (PLA) of 94.68% and 90.35% was obtained with PB_SS and SPB_SS methods respectively on frontal images. Conclusions: A new method for face parts parsing was proposed which efficiently segmented a facial image into its constitute parts.


Author(s):  
A. Adam ◽  
L. Grammatikopoulos ◽  
G. Karras ◽  
E. Protopapadakis ◽  
K. Karantzalos

Abstract. 3D semantic segmentation is the joint task of partitioning a point cloud into semantically consistent 3D regions and assigning them to a semantic class/label. While the traditional approaches for 3D semantic segmentation typically rely only on structural information of the objects (i.e. object geometry and shape), the last years many techniques combining both visual and geometric features have emerged, taking advantage of the progress in SfM/MVS algorithms that reconstruct point clouds from multiple overlapping images. Our work describes a hybrid methodology for 3D semantic segmentation, relying both on 2D and 3D space and aiming at exploring whether image selection is critical as regards the accuracy of 3D semantic segmentation of point clouds. Experimental results are demonstrated on a free online dataset depicting city blocks around Paris. The experimental procedure not only validates that hybrid features (geometric and visual) can achieve a more accurate semantic segmentation, but also demonstrates the importance of the most appropriate view for the 2D feature extraction.


2018 ◽  
Vol 1 (1) ◽  
Author(s):  
Zhitao Li

The audio and video decoding and synchronization playback system ofMPEG-2 TS stream is designed and implemented based on ARM embedded system. In this system, hardware processor is embedded in the ARM processor. In order to make full use of this resource, hardware MFC is adopted. The multi-format codec decoder decodes the video data and decodes the audio data using the open source Mad (libmad) library. The V4L2 (Video for Linux2) driver interface and the ALSA (advanced Linux sound architecture) library are used to implement the video. Because the video frame playback period and the hardware processing delay are inconsistent, the system has a time difference between the audio and video data operations, which causes the audio and video playback to be out of sync. Therefore, we use the method of synchronizing the video playback implemented to the audio playback stream; realize the audio and video are playing sync. Test results show that, the designed audio decodes and synchronization playback system can decode and synchronize audio and video data.


Sign in / Sign up

Export Citation Format

Share Document