Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

applying RDT to widowx bridge Task in SimplerEnv, some pretraining details of RDT in bridgev2 dataset #83

Open
pancake-w opened this issue Feb 27, 2025 · 0 comments

Comments

@pancake-w
Copy link

In SimplerEnv maniskill3 branch, we would like to apply RDT to a series task of 'widowx bridge'. However, the environment only supports widowx robot with delta ee_pose controller instead of the absolute ee_pose controller.

I found that when pretrained, the input of RDT is "arm_joint_0_pos,arm_joint_1_pos,arm_joint_2_pos,arm_joint_3_pos,arm_joint_4_pos,arm_joint_5_pos,gripper_joint_0_pos,eef_pos_x,eef_pos_y,eef_pos_z,eef_angle_0,eef_angle_1,eef_angle_2,eef_angle_3,eef_angle_4,eef_angle_5", but I can't find the output data format [action["format"]].

According to #14 ,I guessed the output data is the future absolute state(joint_pos(6), gripper(1) and ee_pose(9)). So I tried to use the future 6-dimensional joint_pos of widowx, solve FK to get the ee_pose and finally set delta_ee_pose = predict_ee_pose - last_obs_ee_pose to delta ee_pose controller that we only have.[Actually I tried to use the future ee_pose(9), but its performance is very poor, and the delta ee_pose is an order of magnitude larger.]

However, the widowx arm is the swinging near the initial pose. And sometimes the arm goes back until out of the observation. Just like #52 , the widowx didn't work as expected and the robot moved in the wrong direction.

Question 1 : Is this the expected performance of the pre-trained model(RDT-1B)?
Question 2 : Is there an issue with the input and output data format I used? Can the pre-trained RDT-1B model output delta_ee_pose? If so, where is the output stored in the unified action space? What is the input and output format during pretraining in Bridge V2?
Question 3 : What is the coordinate system used during the pretraining of this RDT-1B? When evaluating the policy, I am unsure which point should be used as the origin of the coordinate system to avoid affecting output performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant