Estimating mechanical properties of cloth from videos using dense motion trajectories: human psychophysics and machine learning
Humans can visually estimate the mechanical properties of deformable objects (e.g. cloth stiffness). While much of the recent work on material perception has focused on static image cues (e.g., textures and shape), little is known whether humans can integrate information over time to make a judgment. Here, we investigate the effect of spatiotemporal information across multiple frames (multi-frame motion) on estimating the bending stiffness of cloth. Using high-fidelity cloth animations, we first examined how the perceived bending stiffness changed as a function of the physical bending stiffness defined in the simulation model. Using maximum likelihood difference scaling methods (MLDS) we found that the perceived stiffness and the physical bending stiffness were highly correlated. A second experiment in which we scrambled the frame sequences diminished this correlation. This suggests that multi-frame motion plays an important role. To provide further evidence for this finding, we extracted dense motion trajectories from the videos across 15 consecutive frames and used the trajectory descriptors to train a machine-learning model with the measured perceptual scales. The model can predict human perceptual scales in new videos with varied winds, optical properties of cloth, and scene setups. When the correct multi-frame was removed (using either scrambled videos or 2-frame optical flow to train the model), the predictions significantly worsened. Our findings demonstrate that multi-frame motion information is important for both humans and machines to estimate the mechanical properties. In addition, we show that dense motion trajectories are effective features to build a successful automatic cloth estimation system.