Estimating Perceptual Attributes of Haptic Textures Using Visuo-Tactile Data

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of accurately predicting haptic texture perception attributes to enhance realism in VR/AR interaction and robotic surface understanding. To overcome the poor generalizability of existing unimodal approaches, we first construct a psychophysically calibrated four-dimensional haptic perceptual space—spanning coarse–smooth, flat–bumpy, sticky–slippery, and hard–soft dimensions—and propose a vision–haptics bimodal joint mapping framework. Specifically, a CNN-based autoencoder extracts visual texture features, while a ConvLSTM models temporal haptic signals; multi-feature fusion enables cross-modal perceptual score regression. Under leave-one-out cross-validation, our method achieves significantly lower MAE and RMSE than unimodal baselines and demonstrates strong generalization to unseen textures. This work establishes a novel, interpretable, and transferable paradigm for haptic perception modeling, advancing embodied intelligence and immersive human–machine interaction.

Technology Category

Application Category

📝 Abstract

Accurate prediction of perceptual attributes of haptic textures is essential for advancing VR and AR applications and enhancing robotic interaction with physical surfaces. This paper presents a deep learning-based multi-modal framework, incorporating visual and tactile data, to predict perceptual texture ratings by leveraging multi-feature inputs. To achieve this, a four-dimensional haptic attribute space encompassing rough-smooth, flat-bumpy, sticky-slippery, and hard-soft dimensions is first constructed through psychophysical experiments, where participants evaluate 50 diverse real-world texture samples. A physical signal space is subsequently created by collecting visual and tactile data from these textures. Finally, a deep learning architecture integrating a CNN-based autoencoder for visual feature learning and a ConvLSTM network for tactile data processing is trained to predict user-assigned attribute ratings. This multi-modal, multi-feature approach maps physical signals to perceptual ratings, enabling accurate predictions for unseen textures. To evaluate predictive accuracy, we employed leave-one-out cross-validation to rigorously assess the model's reliability and generalizability against several machine learning and deep learning baselines. Experimental results demonstrate that the framework consistently outperforms single-modality approaches, achieving lower MAE and RMSE, highlighting the efficacy of combining visual and tactile modalities.

Problem

Research questions and friction points this paper is trying to address.

Predicting haptic texture attributes for VR/AR and robotics

Mapping visual-tactile data to perceptual ratings via deep learning

Improving accuracy over single-modality methods in texture evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning-based multi-modal framework

CNN autoencoder for visual feature learning

ConvLSTM network for tactile data processing

🔎 Similar Papers

No similar papers found.

Authors to Follow