scholarly journals Grounding Action Descriptions in Videos

Author(s):  
Michaela Regneri ◽  
Marcus Rohrbach ◽  
Dominikus Wetzel ◽  
Stefan Thater ◽  
Bernt Schiele ◽  
...  

Recent work has shown that the integration of visual information into text-based models can substantially improve model predictions, but so far only visual information extracted from static images has been used. In this paper, we consider the problem of grounding sentences describing actions in visual information extracted from videos. We present a general purpose corpus that aligns high quality videos with multiple natural language descriptions of the actions portrayed in the videos, together with an annotation of how similar the action descriptions are to each other. Experimental results demonstrate that a text-based model of similarity between actions improves substantially when combined with visual information from videos depicting the described actions.

Author(s):  
Hu Xu ◽  
Bing Liu ◽  
Lei Shu ◽  
Philip S. Yu

Learning high-quality domain word embeddings is important for achieving good performance in many NLP tasks. General-purpose embeddings trained on large-scale corpora are often sub-optimal for domain-specific applications. However, domain-specific tasks often do not have large in-domain corpora for training high-quality domain embeddings. In this paper, we propose a novel lifelong learning setting for domain embedding. That is, when performing the new domain embedding, the system has seen many past domains, and it tries to expand the new in-domain corpus by exploiting the corpora from the past domains via meta-learning. The proposed meta-learner characterizes the similarities of the contexts of the same word in many domain corpora, which helps retrieve relevant data from the past domains to expand the new domain corpus. Experimental results show that domain embeddings produced from such a process improve the performance of the downstream tasks.


2020 ◽  
Vol 2020 (4) ◽  
pp. 116-1-116-7
Author(s):  
Raphael Antonius Frick ◽  
Sascha Zmudzinski ◽  
Martin Steinebach

In recent years, the number of forged videos circulating on the Internet has immensely increased. Software and services to create such forgeries have become more and more accessible to the public. In this regard, the risk of malicious use of forged videos has risen. This work proposes an approach based on the Ghost effect knwon from image forensics for detecting forgeries in videos that can replace faces in video sequences or change the mimic of a face. The experimental results show that the proposed approach is able to identify forgery in high-quality encoded video content.


10.28945/3391 ◽  
2009 ◽  
Author(s):  
Moshe Pelleh

In our world, where most systems become embedded systems, the approach of designing embedded systems is still frequently similar to the approach of designing organic systems (or not embedded systems). An organic system, like a personal computer or a work station, must be able to run any task submitted to it at any time (with certain constrains depending on the machine). Consequently, it must have a sophisticated general purpose Operating System (OS) to schedule, dispatch, maintain and monitor the tasks and assist them in special cases (particularly communication and synchronization between them and with external devices). These OSs require an overhead on the memory, on the cache and on the run time. Moreover, generally they are task oriented rather than machine oriented; therefore the processor's throughput is penalized. On the other hand, an embedded system, like an Anti-lock Braking System (ABS), executes always the same software application. Frequently it is a small or medium size system, or made up of several such systems. Many small or medium size embedded systems, with limited number of tasks, can be scheduled by our proposed hardware architecture, based on the Motorola 500MHz MPC7410 processor, enhancing its throughput and avoiding the software OS overhead, complexity, maintenance and price. Encouraged by our experimental results, we shall develop a compiler to assist our method. In the meantime we will present here our proposal and the experimental results.


Electronics ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 741
Author(s):  
Yuseok Ban ◽  
Kyungjae Lee

Many researchers have suggested improving the retention of a user in the digital platform using a recommender system. Recent studies show that there are many potential ways to assist users to find interesting items, other than high-precision rating predictions. In this paper, we study how the diverse types of information suggested to a user can influence their behavior. The types have been divided into visual information, evaluative information, categorial information, and narrational information. Based on our experimental results, we analyze how different types of supplementary information affect the performance of a recommender in terms of encouraging users to click more items or spend more time in the digital platform.


2021 ◽  
Vol 11 (7) ◽  
pp. 2987
Author(s):  
Takumi Okumura ◽  
Yuichi Kurita

Image therapy, which creates illusions with a mirror and a head mount display, assists movement relearning in stroke patients. Mirror therapy presents the movement of the unaffected limb in a mirror, creating the illusion of movement of the affected limb. As the visual information of images cannot create a fully immersive experience, we propose a cross-modal strategy that supplements the image with sensual information. By interacting with the stimuli received from multiple sensory organs, the brain complements missing senses, and the patient experiences a different sense of motion. Our system generates the sense of stair-climbing in a subject walking on a level floor. The force sensation is presented by a pneumatic gel muscle (PGM). Based on motion analysis in a human lower-limb model and the characteristics of the force exerted by the PGM, we set the appropriate air pressure of the PGM. The effectiveness of the proposed system was evaluated by surface electromyography and a questionnaire. The experimental results showed that by synchronizing the force sensation with visual information, we could match the motor and perceived sensations at the muscle-activity level, enhancing the sense of stair-climbing. The experimental results showed that the visual condition significantly improved the illusion intensity during stair-climbing.


2021 ◽  
Vol 11 (15) ◽  
pp. 7169
Author(s):  
Mohamed Allouche ◽  
Tarek Frikha ◽  
Mihai Mitrea ◽  
Gérard Memmi ◽  
Faten Chaabane

To bridge the current gap between the Blockchain expectancies and their intensive computation constraints, the present paper advances a lightweight processing solution, based on a load-balancing architecture, compatible with the lightweight/embedding processing paradigms. In this way, the execution of complex operations is securely delegated to an off-chain general-purpose computing machine while the intimate Blockchain operations are kept on-chain. The illustrations correspond to an on-chain Tezos configuration and to a multiprocessor ARM embedded platform (integrated into a Raspberry Pi). The performances are assessed in terms of security, execution time, and CPU consumption when achieving a visual document fingerprint task. It is thus demonstrated that the advanced solution makes it possible for a computing intensive application to be deployed under severely constrained computation and memory resources, as set by a Raspberry Pi 3. The experimental results show that up to nine Tezos nodes can be deployed on a single Raspberry Pi 3 and that the limitation is not derived from the memory but from the computation resources. The execution time with a limited number of fingerprints is 40% higher than using a classical PC solution (value computed with 95% relative error lower than 5%).


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1471
Author(s):  
Yongxiang Wang ◽  
William Clifford ◽  
Charles Markham ◽  
Catherine Deegan

Distractions external to a vehicle contribute to visual attention diversion that may cause traffic accidents. As a low-cost and efficient advertising solution, billboards are widely installed on side of the road, especially the motorway. However, the effect of billboards on driver distraction, eye gaze, and cognition has not been fully investigated. This study utilises a customised driving simulator and synchronised electroencephalography (EEG) and eye tracking system to investigate the cognitive processes relating to the processing of driver visual information. A distinction is made between eye gaze fixations relating to stimuli that assist driving and others that may be a source of distraction. The study compares the driver’s cognitive responses to fixations on billboards with fixations on the vehicle dashboard. The measured eye-fixation related potential (EFRP) shows that the P1 components are similar; however, the subsequent N1 and P2 components differ. In addition, an EEG motor response is observed when the driver makes an adjustment of driving speed when prompted by speed limit signs. The experimental results demonstrate that the proposed measurement system is a valid tool in assessing driver cognition and suggests the cognitive level of engagement to the billboard is likely to be a precursor to driver distraction. The experimental results are compared with the human information processing model found in the literature.


2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Nathan Tessema Ersumo ◽  
Cem Yalcin ◽  
Nick Antipa ◽  
Nicolas Pégard ◽  
Laura Waller ◽  
...  

Abstract Dynamic axial focusing functionality has recently experienced widespread incorporation in microscopy, augmented/virtual reality (AR/VR), adaptive optics and material processing. However, the limitations of existing varifocal tools continue to beset the performance capabilities and operating overhead of the optical systems that mobilize such functionality. The varifocal tools that are the least burdensome to operate (e.g. liquid crystal, elastomeric or optofluidic lenses) suffer from low (≈100 Hz) refresh rates. Conversely, the fastest devices sacrifice either critical capabilities such as their dwelling capacity (e.g. acoustic gradient lenses or monolithic micromechanical mirrors) or low operating overhead (e.g. deformable mirrors). Here, we present a general-purpose random-access axial focusing device that bridges these previously conflicting features of high speed, dwelling capacity and lightweight drive by employing low-rigidity micromirrors that exploit the robustness of defocusing phase profiles. Geometrically, the device consists of an 8.2 mm diameter array of piston-motion and 48-μm-pitch micromirror pixels that provide 2π phase shifting for wavelengths shorter than 1100 nm with 10–90% settling in 64.8 μs (i.e., 15.44 kHz refresh rate). The pixels are electrically partitioned into 32 rings for a driving scheme that enables phase-wrapped operation with circular symmetry and requires <30 V per channel. Optical experiments demonstrated the array’s wide focusing range with a measured ability to target 29 distinct resolvable depth planes. Overall, the features of the proposed array offer the potential for compact, straightforward methods of tackling bottlenecked applications, including high-throughput single-cell targeting in neurobiology and the delivery of dense 3D visual information in AR/VR.


2020 ◽  
Vol 12 (4) ◽  
pp. 676 ◽  
Author(s):  
Yong Yang ◽  
Wei Tu ◽  
Shuying Huang ◽  
Hangyuan Lu

Pansharpening is the process of fusing a low-resolution multispectral (LRMS) image with a high-resolution panchromatic (PAN) image. In the process of pansharpening, the LRMS image is often directly upsampled by a scale of 4, which may result in the loss of high-frequency details in the fused high-resolution multispectral (HRMS) image. To solve this problem, we put forward a novel progressive cascade deep residual network (PCDRN) with two residual subnetworks for pansharpening. The network adjusts the size of an MS image to the size of a PAN image twice and gradually fuses the LRMS image with the PAN image in a coarse-to-fine manner. To prevent an overly-smooth phenomenon and achieve high-quality fusion results, a multitask loss function is defined to train our network. Furthermore, to eliminate checkerboard artifacts in the fusion results, we employ a resize-convolution approach instead of transposed convolution for upsampling LRMS images. Experimental results on the Pléiades and WorldView-3 datasets prove that PCDRN exhibits superior performance compared to other popular pansharpening methods in terms of quantitative and visual assessments.


2017 ◽  
Vol 54 (4) ◽  
pp. 475-488
Author(s):  
MATTHEW McKEEVER

AbstractIn this article, I argue that recent work in analytic philosophy on the semantics of names and the metaphysics of persistence supports two theses in Buddhist philosophy, namely the impermanence of objects and a corollary about how referential language works. According to this latter package of views, the various parts of what we call one object (say, King Milinda) possess no unity in and of themselves. Unity comes rather from language, in that we have terms (say, ‘King Milinda’) which stand for all the parts taken together. Objects are mind- (or rather language-)generated fictions. I think this package can be cashed out in terms of two central contemporary views. The first is that there are temporal parts: just as an object is spatially extended by having spatial parts at different spatial locations, so it is temporally extended by having temporal parts at different temporal locations. The second is that names are predicates: rather than standing for any one thing, a name stands for a range of things. The natural language term ‘Milinda’ is not akin to a logical constant, but akin to a predicate.Putting this together, I'll argue that names are predicates with temporal parts in their extension, which parts have no unity apart from falling under the same predicate. ‘Milinda’ is a predicate which has in its extension all Milinda's parts. The result is an interesting and original synthesis of plausible positions in semantics and metaphysics, which makes good sense of a central Buddhist doctrine.


Sign in / Sign up

Export Citation Format

Share Document