A deep learning approach to 2D/3D object affordance understanding

Mapping Intimacies ◽

10.12681/eadd/47421 ◽

2020 ◽

Author(s):

Σπυρίδων Θερμός

Keyword(s):

Deep Learning ◽

Learning Approach ◽

3D Object ◽

Object Affordance

Η δυνατότητα να αναγνωρίζουμε τα αντικείμενα που μας περιβάλουν και να αξιοποιούμε την πλούσια οπτική πληροφορία που τα χαρακτηρίζει, αποτελεί μια σημαντική πρόκληση για τον τομέα της όρασης υπολογιστών. Τα αντικείμενα αποτελούν στοιχεία κλειδιά για ένα ευρύ πεδίο εφαρμογών που εκτείνεται από την κατανόηση χαρακτηριστικών σκηνής και τον αυτοματισμό, μέχρι την ασφάλεια και τη ρομποτική. Τα τελευταία χρόνια έχουν γίνει σημαντικά βήματα προς τον εντοπισμό και την αναγνώριση 2Δ/3Δ αντικειμένων χρησιμοποιώντας τεχνικές βαθιάς μάθησης, που συνοδεύτηκαν από τη σημαντική βελτίωση στον τομέα της υπολογιστικής ισχύος. Ωστόσο, η εύρεση αποτελεσματικών αλγορίθμων για την κατανόηση των χαρακτηριστικών ενός αντικειμένου παραμένει μια ανοιχτή πρόκληση, μιας και οι υπάρχουσες ερευνητικές εργασίες επικεντρώνονται κυρίως στα χαρακτηριστικά εμφάνισης των αντικειμένων, όπως το σχήμα και το χρώμα, αγνοώντας τη λειτουργικότητά τους. Στην παρούσα διατριβή αναπτύσσονται μοντέλα και τεχνικές για την κατανόηση της λειτουργικότητας των αντικειμένων, η οποία καθορίζει τους τρόπους με τους οποίους μπορούν να χρησιμοποιηθούν τα αντικείμενα αυτά από τον άνθρωπο. Αρχικά, εξετάζεται η επίδραση της λειτουργικότητας των αντικειμένων ως πρόσθετο χαρακτηριστικό για την αναγνώρισή τους. Το χαρακτηριστικό αυτό εξάγεται παρατηρώντας ακολουθίες αλληλεπίδρασης ανθρώπου-αντικειμένου. Μάλιστα, αξιοποιώντας πρόσφατα αποτελέσματα από έρευνες στον τομέα των νευροεπιστημών, εφαρμόζεται για πρώτη φορά η λεγόμενη «αισθητικοκινητική» μάθηση στο πεδίο της όρασης υπολογιστών, χρησιμοποιώντας μοντέλα βαθιάς μάθησης ώστε να συνδυαστούν χαρακτηριστικά εμφάνισης και λειτουργικότητας (μέσω κίνησης) με σκοπό τη βελτίωση της αναγνώρισης 2Δ/3Δ αντικειμένων σε βίντεο και εικόνες. Στη συνέχεια, παρουσιάζεται ένα μοντέλο κωδικοποίησης-αποκωδικοποίησης πληροφορίας για τον εντοπισμό και το διαχωρισμό (σε επίπεδο εικονοστοιχείου) του μέρους του αντικειμένου που υποστηρίζει συγκεκριμένες χρήσεις. Η παραπάνω διαδικασία είναι εφαρμόσιμη και σε δεδομένα βίντεο και εικόνας. Μάλιστα, το συγκεκριμένο μοντέλο έχει τη δυνατότητα να επικεντρώνεται στο σημείο της επαφής του ανθρώπου με το αντικείμενο κατά τη διάρκεια της αλληλεπίδρασης, χωρίς την ανάγκη χρησιμοποίησης πρόσθετης πληροφορίας όπως είναι η κλάση ή η ακριβής τοποθεσία του αντικειμένου. Τέλος, παρουσιάζεται η πρώτη εκτενής βάση δεδομένων που μπορεί να χρησιμοποιηθεί για την εκπαίδευση και την αξιολόγηση μοντέλων που επεξεργάζονται χαρακτηριστικά λειτουργικότητας αντικειμένων. Η συγκεκριμένη βάση δεδομένων είναι διαθέσιμη για δημόσια χρήση και αποτελείται από δεδομένα RGB-D βίντεο (περιέχοντας δηλαδή σε κάθε πλαίσιο εικόνας και δεδομένα χρωματικού πεδίου και δεδομένα βάθους) τα οποία απεικονίζουν αλληλεπιδράσεις ανθρώπων με αντικείμενα. Ακόμα, περιέχει επισημειώσεις για τα παραπάνω δεδομένα σε μορφή κλάσεων για τα αντικείμενα και τις αλληλεπιδράσεις, σε επίπεδο βίντεο, εικόνας, αλλά και εικονοστοιχείου. Η αποτελεσματικότητα των μοντέλων που σχεδιάστηκαν για τους παραπάνω σκοπούς αποδεικνύεται μέσω εκτενών πειραμάτων που αξιοποιούν δεδομένα από την παραπάνω βάση. Συγκρίνοντας τα παραπάνω αποτελέσματα με αντίστοιχα της βιβλιογραφίας εξάγονται δύο συμπεράσματα. Πρώτον, είναι σαφής η βελτίωση στην αναγνώριση αντικειμένων όταν αξιοποιείται η λειτουργικότητά τους ως πρόσθετο χαρακτηριστικό, και δεύτερον είναι δυνατός ο ακριβής εντοπισμός και διαχωρισμός του μέρους του αντικειμένου που υποστηρίζει μια συγκεκριμένη λειτουργικότητα σε δεδομένα βίντεο και εικόνας, και μάλιστα χωρίς να είναι απαραίτητη η ύπαρξη πρόσθετης πληροφορίας για το αντικείμενο.

Download Full-text

AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection

2018 IEEE International Conference on Robotics and Automation (ICRA) ◽

10.1109/icra.2018.8460902 ◽

2018 ◽

Cited By ~ 31

Author(s):

Thanh-Toan Do ◽

Anh Nguyen ◽

Ian Reid

Keyword(s):

Deep Learning ◽

Learning Approach ◽

Object Affordance ◽

End To End

Download Full-text

A Deep Learning Approach to Object Affordance Segmentation

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp40776.2020.9054167 ◽

2020 ◽

Author(s):

Spyridon Thermos ◽

Petros Daras ◽

Gerasimos Potamianos

Keyword(s):

Deep Learning ◽

Learning Approach ◽

Object Affordance

Download Full-text

Comparison of various Activation Functions A Deep Learning Approach

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i3.122126 ◽

2018 ◽

Vol 6 (3) ◽

pp. 122-126

Author(s):

Mohammed Ibrahim Khan ◽

◽

Akansha Singh ◽

Anand Handa ◽

◽

...

Keyword(s):

Deep Learning ◽

Learning Approach ◽

Activation Functions

Download Full-text

A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/3/3 ◽

2020 ◽

Vol 17 (3) ◽

pp. 299-305 ◽

Cited By ~ 1

Author(s):

Riaz Ahmad ◽

Saeeda Naz ◽

Muhammad Afzal ◽

Sheikh Rashid ◽

Marcus Liwicki ◽

...

Keyword(s):

Deep Learning ◽

Character Recognition ◽

Data Augmentation ◽

Short Term Memory ◽

Recognition System ◽

Learning Approach ◽

Arabic Text ◽

Data Set ◽

Processing Step ◽

Handwritten Arabic

This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.

Download Full-text

Author response for "Deep learning and reinforcement learning approach on microgrid"

10.1002/2050-7038.12531/v2/response1 ◽

2020 ◽

Author(s):

Kumar Chandrasekaran ◽

Prabaakaran Kandasamy ◽

Srividhya Ramanathan

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Author Response ◽

Learning Approach

Download Full-text

Deep Learning Approach for the Detection of Depression in Twitter

SSRN Electronic Journal ◽

10.2139/ssrn.3441333 ◽

2019 ◽

Author(s):

Aswathy K S ◽

Rafeeque P C ◽

Reena Murali

Keyword(s):

Deep Learning ◽

Learning Approach ◽

Detection Of Depression

Download Full-text

An Ensemble Deep Learning Approach to Explore the Impact of Enticement, Engagement and Experience in Reward Based Crowdfunding

SSRN Electronic Journal ◽

10.2139/ssrn.3615176 ◽

2020 ◽

Author(s):

Arvind Srinivasan ◽

Akilandeswari P

Keyword(s):

Deep Learning ◽

Learning Approach ◽

The Impact

Download Full-text

A Deep Learning Approach to Estimate Forward Default Intensities

SSRN Electronic Journal ◽

10.2139/ssrn.3657019 ◽

2020 ◽

Author(s):

Marc-Aurèle Divernois

Keyword(s):

Deep Learning ◽

Learning Approach

Download Full-text

Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

10.21437/interspeech.2018-1020 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shuai Nie ◽

Shan Liang ◽

Bin Liu ◽

Yaping Zhang ◽

Wenju Liu ◽

...

Keyword(s):

Signal Processing ◽

Deep Learning ◽

Speech Enhancement ◽

Learning Approach

Download Full-text

Virtual Screening Meets Deep Learning

Current Computer - Aided Drug Design ◽

10.2174/1573409914666181018141602 ◽

2018 ◽

Vol 15 (1) ◽

pp. 6-28 ◽

Cited By ~ 6

Author(s):

Javier Pérez-Sianes ◽

Horacio Pérez-Sánchez ◽

Fernando Díaz

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Deep Learning ◽

Virtual Screening ◽

Great Increase ◽

New Drugs ◽

Learning Approach ◽

Screening Strategies ◽

Computer Aided ◽

Recent Developments

Background: Automated compound testing is currently the de facto standard method for drug screening, but it has not brought the great increase in the number of new drugs that was expected. Computer- aided compounds search, known as Virtual Screening, has shown the benefits to this field as a complement or even alternative to the robotic drug discovery. There are different methods and approaches to address this problem and most of them are often included in one of the main screening strategies. Machine learning, however, has established itself as a virtual screening methodology in its own right and it may grow in popularity with the new trends on artificial intelligence. Objective: This paper will attempt to provide a comprehensive and structured review that collects the most important proposals made so far in this area of research. Particular attention is given to some recent developments carried out in the machine learning field: the deep learning approach, which is pointed out as a future key player in the virtual screening landscape.

Download Full-text