Description
In SimplerEnv maniskill3 branch, we would like to apply RDT to a series task of 'widowx bridge'. However, the environment only supports widowx robot with delta ee_pose controller instead of the absolute ee_pose controller.
I found that when pretrained, the input of RDT is "arm_joint_0_pos,arm_joint_1_pos,arm_joint_2_pos,arm_joint_3_pos,arm_joint_4_pos,arm_joint_5_pos,gripper_joint_0_pos,eef_pos_x,eef_pos_y,eef_pos_z,eef_angle_0,eef_angle_1,eef_angle_2,eef_angle_3,eef_angle_4,eef_angle_5"
, but I can't find the output data format [action["format"]].
According to #14 ,I guessed the output data is the future absolute state(joint_pos(6), gripper(1) and ee_pose(9))
. So I tried to use the future 6-dimensional joint_pos of widowx, solve FK to get the ee_pose and finally set delta_ee_pose = predict_ee_pose - last_obs_ee_pose
to delta ee_pose controller that we only have.[Actually I tried to use the future ee_pose(9), but its performance is very poor, and the delta ee_pose is an order of magnitude larger.]
However, the widowx arm is the swinging near the initial pose. And sometimes the arm goes back until out of the observation. Just like #52 , the widowx didn't work as expected and the robot moved in the wrong direction.
Question 1 : Is this the expected performance of the pre-trained model(RDT-1B)?
Question 2 : Is there an issue with the input and output data format I used? Can the pre-trained RDT-1B model output delta_ee_pose? If so, where is the output stored in the unified action space? What is the input and output format during pretraining in Bridge V2?
Question 3 : What is the coordinate system used during the pretraining of this RDT-1B? When evaluating the policy, I am unsure which point should be used as the origin of the coordinate system to avoid affecting output performance.