feat(policies): Add X-VLA #2405

jadechoghari · 2025-11-07T11:59:30Z

What this does

feat(policies): Add X-VLA

X-VLA was proposed here: https://thu-air-dream.github.io/X-VLA/ and won Champion @ AgiBot World Challenge @ IROS 2025

This is the full integration of it inside LeRobot by the LeRobot team
Libero also got updated to handle 1) different control mode, delta vs absolute - 2) you can now specify the max episode length, otherwise it will go to default depending on the task suite you choose

TODO:
~~Train and evaluate on libero and report success rate~~
~~Test on a real world task like picking, transfering a cube~~
~~Add testing~~

For finetuning / training
❄️ VLM vision encoder: FROZEN
❄️ VLM language encoder: FROZEN
🔥 Policy transformer: TRAINABLE
🔥 Soft prompts: TRAINABLE

Copilot

Pull Request Overview

This PR adds XVLA (Extended Vision-Language-Action) policy support to LeRobot. XVLA is a multi-modal policy that combines vision, language, and proprioceptive inputs with a domain-aware transformer architecture for robot manipulation tasks.

Key changes:

Implements XVLA policy with Florence-2 vision-language backbone and soft-prompted transformer
Adds domain-aware action spaces (EE6D, Joint, AGIBOT) with specialized loss functions
Integrates XVLA into the LeRobot policy factory and configuration system

Reviewed Changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`train.sh`	Training script for XVLA with wandb and dataset configuration
`test_xvla.py`	Test script to instantiate and verify XVLA policy
`src/lerobot/policies/xvla/transformer.py`	Core transformer architecture with domain-aware layers and soft prompts
`src/lerobot/policies/xvla/processing_xvla.py`	Multi-modal processor for images and language with padding/masking
`src/lerobot/policies/xvla/modeling_xvla.py`	Main policy class implementing training/inference pipeline
`src/lerobot/policies/xvla/modeling_florence2.py`	Florence-2 vision-language model (encoder/decoder)
`src/lerobot/policies/xvla/configuration_xvla.py`	XVLA configuration with Florence2 integration
`src/lerobot/policies/xvla/configuration_florence2.py`	Florence-2 model configuration classes
`src/lerobot/policies/xvla/action_hub.py`	Action space registry with EE6D, Joint, AGIBOT variants
`src/lerobot/policies/factory.py`	Factory integration for XVLA policy creation
`src/lerobot/policies/__init__.py`	Export XVLA configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/lerobot/policies/xvla/modeling_xvla.py

src/lerobot/policies/xvla/action_hub.py

src/lerobot/policies/xvla/configuration_florence2.py

src/lerobot/policies/xvla/modeling_florence2.py

src/lerobot/policies/xvla/configuration_florence2.py

src/lerobot/policies/xvla/modeling_florence2.py

src/lerobot/policies/xvla/configuration_florence2.py

2toinf

We recommend not freezing the vision and language encoders by default, as this approach may not align with the official implementation. In fact, freezing these two components often leads to a performance drop. We have observed that unfreezing them results in better task adaptation.
Additionally, we strongly advise applying a custom learning rate (typically 1/10th of the learning rate used for the VLM) during training, as suggested in the paper. This adjustment helps achieve optimal performance during fine-tuning.

2toinf · 2025-11-26T13:48:09Z

Further, I’d like to check X-VLA’s performance after post-training with the LeRobot pipeline. Does it match the officially reported results?

jadechoghari · 2025-11-26T14:05:37Z

Hello @2toinf yes this is standard in lerobot, we run a reproducibility check where we compare the expected logits from the preprocessor with the logits produced by lerobot implementation, and we also compare the expected logits of the produced actions with those from the original implementation
See: https://github.com/huggingface/lerobot/blob/171d50e85478537cfcae721845293b17beffd41d/tests/policies/xvla/test_xvla_original_vs_lerobot.py

Along with our Libero benchmark checker

HuggingFaceDocBuilderDev · 2025-11-26T14:34:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Added detailed instructions for implementing a custom optimizer and modifying parameter retrieval for X-VLA finetuning. Signed-off-by: Jinliang Zheng <[email protected]>

michel-aractingi

Overall great work Jade. The PR is very close to review

michel-aractingi · 2025-11-27T20:40:42Z

pyproject.toml

    "ninja>=1.11.1,<2.0.0",
    "flash-attn>=2.5.9,<3.0.0 ; sys_platform != 'darwin'"
 ]
+xlva = ["lerobot[transformers-dep]"]


xlva typo...

michel-aractingi · 2025-11-27T20:41:27Z

src/lerobot/policies/xvla/modeling_xvla.py

+        instance = cls(config, **kwargs)
+        # step 2: locate model.safetensors
+        if os.path.isdir(model_id):
+            print("Loading weights from local directory")


use logging.info instead of print

michel-aractingi · 2025-11-27T20:41:40Z

src/lerobot/policies/xvla/modeling_xvla.py

+            except HfHubHTTPError as e:
+                raise FileNotFoundError(f"model.safetensors not found on the Hub at {model_id}") from e
+
+        print(f"Loading checkpoint from {model_file}")


logging.info instead of print

michel-aractingi · 2025-11-27T20:41:45Z

src/lerobot/policies/xvla/modeling_xvla.py

+            # or deepcopy
+        # step 4: load into instance
+        instance.load_state_dict(state_dict, strict=True)
+        print("Loaded XVLA checkpoint")


michel-aractingi · 2025-11-27T20:45:57Z

src/lerobot/policies/xvla/processor_xvla.py

+    """
+
+    domain_id: int = 0
+    device: str = "cuda"


this seems hardcoded, can I run xvla if I don't have a gpu?
Can use DeviceProcessorStep since its already the next step in the pipeline?

michel-aractingi · 2025-11-27T20:52:18Z

src/lerobot/policies/xvla/processor_xvla.py

+        if obs:
+            for v in obs.values():
+                if isinstance(v, torch.Tensor):
+                    batch_size = v.shape[0]


you can probaby infer the device from obs? device = v.device

michel-aractingi · 2025-11-27T20:55:20Z

src/lerobot/policies/xvla/modeling_florence2.py

+        )
+
+
+if is_flash_attn_2_available():


second condition if is_flash_attn...
Duplicated from line 56

michel-aractingi · 2025-11-27T21:00:03Z

src/lerobot/policies/xvla/modeling_florence2.py

+    """The FLORENCE2 vision model without any head""",
+    FLORENCE2_START_DOCSTRING,
+)
+class Florence2VisionModel(Florence2PreTrainedModel):


This class is unused? remove if true

michel-aractingi · 2025-11-27T21:02:35Z

src/lerobot/policies/xvla/modeling_florence2.py

+    """The FLORENCE2 vision model with projection layer""",
+    FLORENCE2_START_DOCSTRING,
+)
+class Florence2VisionModelWithProjection(Florence2PreTrainedModel):


Also this class is it used anywhere? remove if not used?

michel-aractingi · 2025-11-27T21:05:58Z

src/lerobot/policies/xvla/modeling_xvla.py

+
+        if len(self._queues[ACTION]) == 0:
+            actions = self._get_action_chunk(batch)
+            self._queues[ACTION].extend(actions.transpose(0, 1)[: self.config.n_action_steps])


Verify that the actions are trimmed according to the requested action space as I had to manually trim it in the real robot test

jadechoghari added 2 commits November 7, 2025 11:54

first commit

d9e4d37

more fixes

8a65623

jadechoghari added the enhancement Suggestions for new features or improvements label Nov 7, 2025

Copilot AI review requested due to automatic review settings November 7, 2025 11:59

jadechoghari added the policies Items related to robot policies label Nov 7, 2025

jadechoghari self-assigned this Nov 7, 2025

Copilot AI reviewed Nov 7, 2025

View reviewed changes

jadechoghari added 9 commits November 7, 2025 14:28

add franka action

3cb1424

update testing script

8d9a992

add changes

2219c29

update files

39260a5

logits matching

f52cf79

add imagenet as a norm type

b928c12

logits matching atol1e-2

cde2e24

more eval fixes

589788e

more changes

818c757

danielsanjosepro mentioned this pull request Nov 17, 2025

feat(policies): Allow users to register 3rd party policies - pip install lerobot_policy_mypolicy #2308

Open

jadechoghari added 13 commits November 17, 2025 11:02

xvla works on libero

ab763ab

remove seed

a28a74e

more refactoring

f454729

more fixes

cb7d2ed

more changes

f3b25eb

more changes

fb6f59e

more fixes

5277a99

migrate policy revert

858626d

major pre-commit cleanup

42d615b

renaming

8591fc1

revert to self.transformer

9896ba4

refactor

a6404f6

new changes

b16bc5f

jadechoghari added 13 commits November 24, 2025 14:11

add freeze/unfreeze options

722766b

add testing

936a672

upgrade transformers version

0e21f3f

update testing

2044e52

add installation

6d2166c

remove .sh file

abaf870

fix testing

066fb1b

silent linter in xvlatest

829428a

fix failing test

f62cfc9

upgrade test, fix failing

4e9acd4

fix testing

15dc2fd

more fixes to testing

81cf4d8

require cuda in tests

825146d

jadechoghari requested a review from michel-aractingi November 26, 2025 09:46

jadechoghari added 2 commits November 26, 2025 13:20

Merge branch 'main' into feat/add-xvla

1f00978

temp check

171d50e

2toinf reviewed Nov 26, 2025

View reviewed changes

add xvla docs

fbcf118

jadechoghari and others added 8 commits November 26, 2025 15:34

fix styling

863ae89

update libero doc

ca4b3d0

remove timm dep

0b32605

add different dtype support

ac1de37

remove timm skip

5a9f3e2

remove white lines

602fb7b

Enhance X-VLA finetuning documentation with optimizer details (#2537)

d22fa47

Added detailed instructions for implementing a custom optimizer and modifying parameter retrieval for X-VLA finetuning. Signed-off-by: Jinliang Zheng <[email protected]>

fix style

9cdf46b

michel-aractingi reviewed Nov 27, 2025

View reviewed changes

feat(policies): Add X-VLA #2405

Are you sure you want to change the base?

feat(policies): Add X-VLA #2405

Conversation

jadechoghari commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

2toinf left a comment

Choose a reason for hiding this comment

Uh oh!

2toinf commented Nov 26, 2025

Uh oh!

jadechoghari commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 26, 2025

Uh oh!

michel-aractingi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jadechoghari commented Nov 7, 2025 •

edited

Loading

jadechoghari commented Nov 26, 2025 •

edited

Loading