SignBart: A New Approach for Isolated Sign Language Recognition

SignBart introduces a novel method for Isolated Sign Language Recognition (ISLR) using skeleton sequences, focusing on decoupling the x and y coordinates and leveraging a lightweight encoder-decoder architecture based on BART.

Hightlights

Independent encoding of coordinates: x and y coordinates are independently encoded to better capture their unique spatial characteristics.
Cross-Attention mechanism: Cross-Attention integrates information between x and y after independent encoding.
Lightweight model: Only ~750K parameters, making it significantly smaller than traditional SLR models.
High generalization ability: Achieves superior performance on diverse datasets including LSA-64, WLASL, and ASL-Citizen.
Efficient skeleton sequence processing: Lower computational costs compared to RNN, LSTM, GCN-based models.
Strong ablation results: Highlights the importance of normalization, coordinate projection, and multi-part skeleton input.

About the Model

SignBart addresses the limitations of treating skeleton keypoints as inseparable x-y pairs. Instead, it proposes:

Separate Coordinate Encoding:
- x-coordinates are encoded by the Encoder.
- y-coordinates are encoded by the Decoder.
Attention Mechanisms:
- Self-Attention for x-coordinate encoding, allowing rich bidirectional context learning.
- Self-Causal-Attention for y-coordinate encoding, maintaining temporal causality.
- Cross-Attention to integrate information from x into y, preserving relational dependency.
Input Format:
- Skeleton data extracted using Mediapipe.
- Shape: (T, 75, 2), where T = frames, 75 = keypoints (6 body + 21 left hand + 21 right hand), 2 = (x, y).
Normalization:
- Each component (body, left hand, right hand) normalized independently based on its local bounding box.
- Enhances model generalization and reduces overfitting.
Projection:
- Before entering attention layers, keypoints are linearly projected to a higher-dimensional space (d_model) to enrich feature representation.

Dataset and Keypoints Extraction

Dataset	Videos	Words	Signers	Language
LSA-64	3,200	64	10	Argentinian Sign Language
WLASL	21,083	2,000	119	American Sign Language
ASL-Citizen	>84,000	2,731	52	Community-sourced American Sign Language

Keypoint Extraction Process	Details
Extraction Tool	Google Mediapipe
Keypoints	6 body + full left & right hand
Missing Keypoints	Filled with (0, 0)
Coordinate Normalization	Scaled to [0, 1] relative to frame size
Further Normalization	Local bounding boxes for body, left hand, and right hand

Pretrained Weights

Name	Weight	Config
LSA-64	LSA-64.pth	LSA-64.yaml
WLASL-100	WLASL-100.pth	WLASL-100.yaml
WLASL-300	WLASL-300.pth	WLASL-300.yaml
WLASL-1000	WLASL-1000.pth	WLASL-1000.yaml
WLASL-2000	WLASL-2000.pth	WLASL-2000.yaml
ASL-Citizen-100	ASL-Citizen-100.pth	ASL-Citizen-100.yaml
ASL-Citizen-200	ASL-Citizen-200.pth	ASL-Citizen-200.yaml
ASL-Citizen-400	ASL-Citizen-400.pth	ASL-Citizen-400.yaml
ASL-Citizen-1000	ASL-Citizen-1000.pth	ASL-Citizen-1000.yaml
ASL-Citizen-2731	ASL-Citizen-2731.pth	ASL-Citizen-2731.yaml

Installation

Prerequisites

Python >= 3.8
pip

Setup

# Clone and enter project
git clone https://github.com/tinh2044/SignBart.git
cd SignBart

# (Optional) create virtual environment
python -m venv venv
# macOS/Linux: source venv/bin/activate
# Windows PowerShell: venv\Scripts\Activate.ps1

# Install dependencies
pip install -r requirements.txt

Data Preparation

Download each dataset (LSA-64, WLASL, ASL-Citizen) from its source.

Extract into data/ with structure:

data/lsa-64/{label2id.json,id2label.json,train/,test/}
data/wlasl/{...}
data/asl-citizen/{...}

Usage

Below are two ways to run SignBart: via provided shell scripts or by calling main.py directly.

Using shell scripts

The scripts/ directory includes dataset-specific training and evaluation scripts. For example:

# Training on LSA-64
bash scripts/train_LSA-64.sh
# Evaluation on LSA-64
bash scripts/eval_LSA-64.sh

You can replace dataset names to run other scripts (e.g., train_WLASL-100.sh, eval_ASL-Citizen-100.sh).

Using Python entry point

Train:

python main.py --task train \
  --experiment_name my_experiment \
  --config_path configs/lsa-64.yaml \
  --data_path data/lsa-64 \
  --epochs 200 \
  --lr 2e-5 \
  --seed 379

Evaluate:

python main.py --task eval \
  --experiment_name my_experiment \
  --config_path configs/lsa-64.yaml \
  --pretrained_path checkpoints/my_experiment/epoch_X.pth \
  --data_path data/lsa-64 \
  --seed 379

Optional flags:

--resume_checkpoints PATH
--scheduler_factor FACTOR
--scheduler_patience PATIENCE

Experiments and Results

LSA-64 Dataset

Model	Accuracy	Parameters
SPOTER	100%	5,918,848
HWGATE	98.59%	10,758,354
ST-GCN	92.81%	3,604,180
SL-GCN	98.13%	4,872,306
SignBart	96.04%	749,888

WLASL Dataset

Subset	SignBart Accuracy
WLASL-100	78.00%
WLASL-300	78.50%
WLASL-1000	81.45%
WLASL-2000	68.00%

ASL-Citizen Dataset

Subset	Accuracy	Parameters
ASL-Citizen-100	80.32%	754,532
ASL-Citizen-200	81.49%	2,845,384
ASL-Citizen-400	78.96%	3,424,144
ASL-Citizen-1000	81.45%	3,578,344
ASL-Citizen-2731	75.22%	4,548,523

Ablation Studies

Projection Effect	Accuracy
Without projection	62.08%
With projection	96.04%

Normalization Effect	Accuracy
No normalization	82.50%
One bounding box	90.52%
Two bounding boxes	90.41%
Three bounding boxes (body, left hand, right hand)	96.04%

Skeleton Components	Accuracy
Only body	86.97%
Only left hand	23.02%
Only right hand	70.20%
Both hands	91.35%
All components (body + left + right)	96.04%

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
images		images
scripts		scripts
.gitignore		.gitignore
README.MD		README.MD
attention.py		attention.py
augmentations.py		augmentations.py
dataset.py		dataset.py
decoder.py		decoder.py
encoder.py		encoder.py
layers.py		layers.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SignBart: A New Approach for Isolated Sign Language Recognition

Hightlights

About the Model

Dataset and Keypoints Extraction

Pretrained Weights

Installation

Usage

Using shell scripts

Using Python entry point

Experiments and Results

LSA-64 Dataset

WLASL Dataset

ASL-Citizen Dataset

Ablation Studies

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SignBart: A New Approach for Isolated Sign Language Recognition

Hightlights

About the Model

Dataset and Keypoints Extraction

Pretrained Weights

Installation

Usage

Using shell scripts

Using Python entry point

Experiments and Results

LSA-64 Dataset

WLASL Dataset

ASL-Citizen Dataset

Ablation Studies

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages