B-RIGHT: Benchmark Re-evaluation for Integrity in Generalized Human-Object Interaction Testing

📅 2025-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing HOI evaluation benchmarks (e.g., HICO-DET) suffer from severe class imbalance and train/test distribution misalignment, leading to unreliable model assessments and distorted rankings. To address this, we propose B-RIGHT—the first class-balanced, structurally rigorous HOI benchmark—introducing a novel paradigm for holistic evaluation: a three-stage balancing algorithm comprising automated generation, semantic filtering, and zero-shot partitioning, enabling construction of a comprehensive, distributionally balanced training set and a zero-shot test set. We further establish a standardized evaluation protocol. Experiments demonstrate that B-RIGHT significantly reduces model performance variance (average reduction of 37%), corrects prior ranking biases, and enhances fairness and reliability in cross-model comparison. By rectifying fundamental flaws in existing benchmarks, B-RIGHT establishes a new gold standard for HOI evaluation.

Technology Category

Application Category

📝 Abstract
Human-object interaction (HOI) is an essential problem in artificial intelligence (AI) which aims to understand the visual world that involves complex relationships between humans and objects. However, current benchmarks such as HICO-DET face the following limitations: (1) severe class imbalance and (2) varying number of train and test sets for certain classes. These issues can potentially lead to either inflation or deflation of model performance during evaluation, ultimately undermining the reliability of evaluation scores. In this paper, we propose a systematic approach to develop a new class-balanced dataset, Benchmark Re-evaluation for Integrity in Generalized Human-object Interaction Testing (B-RIGHT), that addresses these imbalanced problems. B-RIGHT achieves class balance by leveraging balancing algorithm and automated generation-and-filtering processes, ensuring an equal number of instances for each HOI class. Furthermore, we design a balanced zero-shot test set to systematically evaluate models on unseen scenario. Re-evaluating existing models using B-RIGHT reveals substantial the reduction of score variance and changes in performance rankings compared to conventional HICO-DET. Our experiments demonstrate that evaluation under balanced conditions ensure more reliable and fair model comparisons.
Problem

Research questions and friction points this paper is trying to address.

AI Model Evaluation
Imbalanced Datasets
HICO-DET
Innovation

Methods, ideas, or system contributions that make the work stand out.

Balanced Dataset
Unseen Scenario Testing
Fair Model Comparison
🔎 Similar Papers
No similar papers found.
Y
Yoojin Jang
Ulsan National Institute of Science and Technology (UNIST)
J
Junsu Kim
Ulsan National Institute of Science and Technology (UNIST), NAVER AI LAB
H
Hayeon Kim
Ulsan National Institute of Science and Technology (UNIST)
E
Eun-ki Lee
Hangyang University
E
Eun-sol Kim
Hangyang University
Seungryul Baek
Seungryul Baek
Associate Professor, UNIST
Deep LearningComputer VisionArticulated Pose EstimationAction and Gesture RecognitionObject
Jaejun Yoo
Jaejun Yoo
Associate Professor, Laboratory of Advanced Imaging Technology (LAIT), UNIST
deep learningmachine learninginverse problemmedical imagingsignal processing