-
Notifications
You must be signed in to change notification settings - Fork 94
Closed
Description
Hi, thanks for your great work!
I have a question regarding reproducibility of environment initializations.
If I set a seed and specify the number of environments I want to create, is it possible to always obtain the same initial conditions for those environments?
The reason I’m asking is that I’m comparing a baseline against my own implementation, and for a fair evaluation I’d like them to start under identical conditions. At least for an evaluation phase of
Is there currently a way to “copy” or reproduce the initial conditions, or would I need to implement a deterministic reset mechanism myself?
Thanks again!
That's an example of my code:
def rollout_policy(
env: TransformedEnv,
policy: torch.nn.Module,
steps: int,
video_path: str | None = None,
) -> Tuple[List[float], List[float]]:
"""
Generic rollout function for both learned and heuristic policies.
Runs `policy` in `env` for `steps` timesteps, returns mean/std reward traces.
"""
def save_video(frames: List[np.ndarray], filename: str, fps: int = 30):
if not frames:
print("[warning] no frames to write", filename)
return
h, w, _ = frames[0].shape
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
if not filename.endswith(".mp4"):
filename += ".mp4"
vout = cv2.VideoWriter(filename, fourcc, fps, (w, h))
for fr in frames:
vout.write(fr)
vout.release()
# video setup
frames = []
callback = None
if video_path is not None:
callback = lambda e, td: frames.append(e.render(mode="rgb_array"))
with torch.no_grad():
td = env.rollout(
max_steps=steps,
policy=policy,
callback=callback,
break_when_any_done=False,
auto_cast_to_device=True,
)
# rewards [T, n_envs, n_agents]
rewards = td.get(("next",) + env.reward_key, None)
if rewards is None:
rewards = td.get(env.reward_key, None)
if rewards is None:
raise RuntimeError("No rewards found in rollout")
mean_trace, std_trace = [], []
for t in range(rewards.shape[1]): # loop over timesteps
r_t = rewards[:, t, :] # [n_envs, n_agents]
r_tot = r_t.mean(dim=1).squeeze(-1) # [n_envs]
m, s = _iqm_and_iqrstd_1d(r_tot.cpu().numpy())
mean_trace.append(float(m))
std_trace.append(float(s))
if video_path is not None:
save_video(frames, video_path, fps=30)
return mean_trace, std_traceMetadata
Metadata
Assignees
Labels
No labels