Scalable annotation pipeline for action-aglined fine-grained instruciton for Visual-language-Action model
benchmark caption vla fine-grained vlm caption-generation vision-language-action-model roboitcs steerable
-
Updated
Jun 11, 2026 - Python