Enhancement of RGB-Based Human Action Recognition via Event-Inspired Multitask Learning

Abstract

Conventional human action recognition (HAR) using RGB cameras is often limited by challenges such as lighting variations and motion blur. While event cameras offer a promising alternative due to their high temporal resolution, they lack textural detail and are constrained by the scarcity of large-scale datasets. To address these issues, this paper proposes a multi-task learning training paradigm that uses event data exclusively as auxiliary supervision during training. Rather than replacing existing recognition architectures, the proposed approach introduces an auxiliary task that transforms RGB data into event-like representations, guiding a shared encoder to learn motion-sensitive features. Since the auxiliary branch is entirely discarded after training, the model operates on RGB input alone at inference, enabling deployment without event sensors. Training is optimized using a loss annealing strategy that gradually shifts focus from the auxiliary task to the primary HAR task. Experiments across five diverse backbones spanning CNN and transformer families show that the proposed framework improves RGB-only baselines across all five tested backbones, with the largest gains observed on transformer-based models in this setting. For select backbones, performance approaches or is slightly better than that of models trained on real event data.

Publication
IEEE Access
Jihwan Won
Jihwan Won
PhD Student

His research interests include machine learning and deep learning algorithms.

Hanwoong Ryu
Hanwoong Ryu
Researcher, Selectstar

His research interests include LLM, deep learning, computer vision, and time series.

Junghwan Lee
Junghwan Lee
PhD Student

His research interests include machine learning and deep learning algorithms.

Cheolsoo Park
Cheolsoo Park
Professor

His research interests include machine learning, adaptive signal processing, computational neuroscience, and wearable technology.