This is the official PyTorch implementation for "Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition". The paper is accepted to CVPR 2025.
In skeleton-based action recognition, a key challenge is distinguishing between actions with similar trajectories of joints due to the lack of image-level details in skeletal representations. Recognizing that the differentiation of similar actions relies on subtle motion details in specific body parts, we direct our approach to focus on the fine-grained motion of local skeleton components. To this end, we introduce ProtoGCN, a Graph Convolutional Network (GCN)-based model that breaks down the dynamics of entire skeleton sequences into a combination of learnable prototypes representing core motion patterns of action units. By contrasting the reconstruction of prototypes, ProtoGCN can effectively identify and enhance the discriminative representation of similar actions. Without bells and whistles, ProtoGCN achieves state-of-the-art performance on multiple benchmark datasets, including NTU RGB+D, NTU RGB+D 120, Kinetics-Skeleton, and FineGYM, which demonstrates the effectiveness of the proposed method.
git clone https://github.com/firework8/ProtoGCN.git
cd ProtoGCN
conda env create -f protogcn.yaml
conda activate protogcn
pip install -e .
PYSKL provides links to the pre-processed skeleton pickle annotations.
- NTU RGB+D: NTU RGB+D Download Link
- NTU RGB+D 120: NTU RGB+D 120 Download Link
- Kinetics-Skeleton: Kinetics-Skeleton Download Link
- FineGYM: FineGYM Download Link
For Kinetics-Skeleton, since the skeleton annotations are large, please use the Kinetics Annotation Link to download the kpfiles
and extract it under $ProtoGCN/data/k400
for Kinetics-Skeleton.
Note that the kpfiles
needs to be extracted under Linux
. Additionally, Kinetics-Skeleton requires the dependency Memcached
to run, which could be referred to here.
You could check the official Data Doc of PYSKL for more detailed instructions.
Please change the config file depending on what you want. You could use the following commands for training and testing. Basically, we support distributed training on a single server with multiple GPUs.
# Training
bash tools/dist_train.sh {config_name} {num_gpus} {other_options}
# For example: train on NTU RGB+D X-Sub (Joint Modality) with 1 GPU, with validation, and test the checkpoint.
bash tools/dist_train.sh configs/ntu60_xsub/j.py 1 --validate --test-last --test-best
# Testing
bash tools/dist_test.sh {config_name} {checkpoint_file} {num_gpus} {other_options}
# For example: test on NTU RGB+D X-Sub (Joint Modality) with metrics `top_k_accuracy`, and dump the result to `result.pkl`.
bash tools/dist_test.sh configs/ntu60_xsub/j.py checkpoints/CHECKPOINT.pth 1 --eval top_k_accuracy --out result.pkl
# Ensemble the results
cd tools
python ensemble.py
All the checkpoints can be downloaded from here.
For the detailed performance of pretrained models, please go to the Model Doc.
This repo is mainly based on PYSKL. We also refer to MS-G3D, CTR-GCN, and FR-Head.
Thanks to the original authors for their excellent work!
If you find ProtoGCN useful in your research, please consider citing our paper:
@article{liu2024revealing,
title={Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition},
author={Liu, Hongda and Liu, Yunfan and Ren, Min and Wang, Hao and Wang, Yunlong and Sun, Zhenan},
journal={arXiv preprint arXiv:2411.18941},
year={2024}
}
For any questions, feel free to contact: [email protected]