Improving human object recognition performance using video enhancement techniques

2004 ◽  
Author(s):  
Lucy S. Whitman ◽  
Colin Lewis ◽  
John P. Oakley
Perception ◽  
1997 ◽  
Vol 26 (1_suppl) ◽  
pp. 33-33
Author(s):  
G M Wallis ◽  
H H Bülthoff

The view-based approach to object recognition supposes that objects are stored as a series of associated views. Although representation of these views as combinations of 2-D features allows generalisation to similar views, it remains unclear how very different views might be associated together to allow recognition from any viewpoint. One cue present in the real world other than spatial similarity, is that we usually experience different objects in temporally constrained, coherent order, and not as randomly ordered snapshots. In a series of recent neural-network simulations, Wallis and Baddeley (1997 Neural Computation9 883 – 894) describe how the association of views on the basis of temporal as well as spatial correlations is both theoretically advantageous and biologically plausible. We describe an experiment aimed at testing their hypothesis in human object-recognition learning. We investigated recognition performance of faces previously presented in sequences. These sequences consisted of five views of five different people's faces, presented in orderly sequence from left to right profile in 45° steps. According to the temporal-association hypothesis, the visual system should associate the images together and represent them as different views of the same person's face, although in truth they are images of different people's faces. In a same/different task, subjects were asked to say whether two faces seen from different viewpoints were views of the same person or not. In accordance with theory, discrimination errors increased for those faces seen earlier in the same sequence as compared with those faces which were not ( p<0.05).


2019 ◽  
Author(s):  
Eshed Margalit ◽  
Sarah B. Herald ◽  
Emily X. Meschke ◽  
Isabel Irawan ◽  
Rafael Maarek ◽  
...  

In 1968 Guzman showed how the myriad of surfaces composing a highly complex and novel assemblage of volumes can be readily assigned to their appropriate volumes in terms of the constraints offered by the vertices of coterminating edges. Of particular importance was the L-vertex, produced by the cotermination of two contours, which provides strong evidence for the termination of a 2D surface. An X-junction, formed by the crossing of two contours without a change of direction at the crossing, played no role in the segmentation of the scene. If the potency of noise elements to affect recognition performance reflected their relevancy to the segmentation of scenes, as suggested by Guzman, X-junctions would be expected to have little or no effect on shape-based object recognition whereas L-junctions would be expected to have a strong deleterious effect when disrupting the smooth continuation of contours. Guzman’s roles for the various vertices and junctions have never been put to systematic test with respect to human object recognition. By adding identical noise contours to line drawings of objects that produced either L-vertices or X-junctions, these shape features could be compared with respect to their disruption of object recognition. Guzman’s insights that irrelevant L-vertices should be disruptive and irrelevant X-vertices would have minimal effect were confirmed.


Perception ◽  
1997 ◽  
Vol 26 (1_suppl) ◽  
pp. 202-202
Author(s):  
P Kalocsai ◽  
W I Biederman

A recognition model which defines a measure of shape similarity on the direct output of multiscale and multiorientation Gabor filters does not manifest qualitative aspects of human object recognition of contour-deleted images in that: (a) it recognises recoverable and nonrecoverable contour-deleted images equally well, whereas humans recognise recoverable images much better; (b) it distinguishes complementary feature-deleted images whereas humans do not. Adding some of the known connectivity patterns of the primary visual cortex to the model in the form of extension fields (connections between collinear and curvilinear units) among filters (a) increased the overall recognition performance of the model, (b) boosted the recognition rate of the recoverable images far more than of the nonrecoverable ones, (c) increased the similarity of complementary feature-deleted images, but not part-deleted ones. These correspond more closely to human psychophysical results. Interestingly, performance was approximately equivalent for narrow (±15 deg) and broad (±90 deg) extension fields.


2019 ◽  
Vol 35 (05) ◽  
pp. 525-533
Author(s):  
Evrim Gülbetekin ◽  
Seda Bayraktar ◽  
Özlenen Özkan ◽  
Hilmi Uysal ◽  
Ömer Özkan

AbstractThe authors tested face discrimination, face recognition, object discrimination, and object recognition in two face transplantation patients (FTPs) who had facial injury since infancy, a patient who had a facial surgery due to a recent wound, and two control subjects. In Experiment 1, the authors showed them original faces and morphed forms of those faces and asked them to rate the similarity between the two. In Experiment 2, they showed old, new, and implicit faces and asked whether they recognized them or not. In Experiment 3, they showed them original objects and morphed forms of those objects and asked them to rate the similarity between the two. In Experiment 4, they showed old, new, and implicit objects and asked whether they recognized them or not. Object discrimination and object recognition performance did not differ between the FTPs and the controls. However, the face discrimination performance of FTP2 and face recognition performance of the FTP1 were poorer than that of the controls were. Therefore, the authors concluded that the structure of the face might affect face processing.


1989 ◽  
Vol 12 (3) ◽  
pp. 381-397 ◽  
Author(s):  
Gary W. Strong ◽  
Bruce A. Whitehead

AbstractPurely parallel neural networks can model object recognition in brief displays – the same conditions under which illusory conjunctions (the incorrect combination of features into perceived objects in a stimulus array) have been demonstrated empirically (Treisman 1986; Treisman & Gelade 1980). Correcting errors of illusory conjunction is the “tag-assignment” problem for a purely parallel processor: the problem of assigning a spatial tag to nonspatial features, feature combinations, and objects. This problem must be solved to model human object recognition over a longer time scale. Our model simulates both the parallel processes that may underlie illusory conjunctions and the serial processes that may solve the tag-assignment problem in normal perception. One component of the model extracts pooled features and another provides attentional tags that correct illusory conjunctions. Our approach addresses two questions: (i) How can objects be identified from simultaneously attended features in a parallel, distributed representation? (ii) How can the spatial selectional requirements of such an attentional process be met by a separation of pathways for spatial and nonspatial processing? Our analysis of these questions yields a neurally plausible simulation of tag assignment based on synchronizing feature processing activity in a spatial focus of attention.


Author(s):  
Michael S. Brickner ◽  
Amir Zvuloni

Thermal imaging (TI) systems, transform the distribution of relative temperatures in a scene into a visible TV image. TIs differ significantly from regular TV images. Most TI systems allow their operators to select preferred polarity which determines the way in which gray shades represent different temperatures. Polarity may be set to either black hot (BH) or white hot (WH). The present experiments were designed to investigate the effects of polarity on object recognition performance in TI and to compare object recognition performance of experts and novices. In the first experiment, twenty flight candidates were asked to recognize target objects in 60 dynamic TI recordings taken from two different TI systems. The targets included a variety of human placed and natural objects. Each subject viewed half the targets in BH and the other half in WH polarity in a balanced experimental design. For 24 out of the 60 targets one direction of polarity produced better performance than the other. Although the direction of superior polarity (BH or WH better) was not consistent, the preferred representation of the target object was very consistent. For example, vegetation was more readily recognized when presented as dark objects on a brighter background. The results are discussed in terms of importance of surface determinants versus edge determinants in the recognition of TI objects. In the second experiment, the performance of 10 expert TI users was found to be significantly more accurate but not much faster than the performance of 20 novice subjects.


2021 ◽  
Vol 7 (4) ◽  
pp. 65
Author(s):  
Daniel Silva ◽  
Armando Sousa ◽  
Valter Costa

Object recognition represents the ability of a system to identify objects, humans or animals in images. Within this domain, this work presents a comparative analysis among different classification methods aiming at Tactode tile recognition. The covered methods include: (i) machine learning with HOG and SVM; (ii) deep learning with CNNs such as VGG16, VGG19, ResNet152, MobileNetV2, SSD and YOLOv4; (iii) matching of handcrafted features with SIFT, SURF, BRISK and ORB; and (iv) template matching. A dataset was created to train learning-based methods (i and ii), and with respect to the other methods (iii and iv), a template dataset was used. To evaluate the performance of the recognition methods, two test datasets were built: tactode_small and tactode_big, which consisted of 288 and 12,000 images, holding 2784 and 96,000 regions of interest for classification, respectively. SSD and YOLOv4 were the worst methods for their domain, whereas ResNet152 and MobileNetV2 showed that they were strong recognition methods. SURF, ORB and BRISK demonstrated great recognition performance, while SIFT was the worst of this type of method. The methods based on template matching attained reasonable recognition results, falling behind most other methods. The top three methods of this study were: VGG16 with an accuracy of 99.96% and 99.95% for tactode_small and tactode_big, respectively; VGG19 with an accuracy of 99.96% and 99.68% for the same datasets; and HOG and SVM, which reached an accuracy of 99.93% for tactode_small and 99.86% for tactode_big, while at the same time presenting average execution times of 0.323 s and 0.232 s on the respective datasets, being the fastest method overall. This work demonstrated that VGG16 was the best choice for this case study, since it minimised the misclassifications for both test datasets.


Sign in / Sign up

Export Citation Format

Share Document