Deep Learning-Based Multi-Modal Data Fusion: A Case Study in Food Intake Episodes Detection Using Wearable Sensors (Preprint)
BACKGROUND Multimodal wearable technologies have brought forward wide possibilities in human activity recognition, and more specifically personalized monitoring of eating habits. The emerging challenge now is the selection of most discriminative information from high dimensional data collected from multiple sources. The available fusion algorithms with their complex structure are poorly adopted to the computationally constrained environment which requires integrating information directly at the source, and therefore more simple low-level fusion method is needed. OBJECTIVE In the absence of a data combining process, the cost of directly applying high dimensional raw data to a deep classifier would be computationally expensive regarding the response time, energy consumption and memory requirement. Considering this, current study aimed to develop a data fusion technique in a computationally efficient way to achieve more comprehensive insight of human activity dynamics in a lower dimension. The major objective was considering statistical dependency of multisensory data and exploring inter-modality correlation patterns for different activity. METHODS In this technique, the information in time (regardless of the number of sources) is transformed into a 2D space that facilitates classification of eating episodes from others. This is based on a hypothesis that data captured by various sensors are statistically associated with each other and covariance matrix of all these signals has a unique distribution correlated with each activity which can be encoded on a contour representation. These representations are then used as input of a deep model to learn specific patterns associated with specific activity. RESULTS In order to show the generalizability of proposed fusion algorithm, two different scenarios were taken into account. These scenarios were different in terms of temporal segment size, type of activity, wearable device, subjects and deep learning architecture. The first scenario used dataset where a single participant performed a limit number of activities while wearing Empatica E4 wristband. In the second scenario, a dataset related to the activities of daily living was used where 10 different participants wearing Inertial Measurement Units during performing a more complex set of activities. The precision metric obtained from leave-one-subject-out cross-validation for second scenarios reached to 0.803. The impact of missing data on performance degradation was also evaluated. CONCLUSIONS To conclude, the proposed fusion technique provides the possibility of embedding joint variability information over different modalities in just a single 2D representation which results in obtaining a more global view of different aspects of daily human activities at hand, and yet preserving the desired performance level in activity recognition. CLINICALTRIAL