Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] params.json is not a valid JSON file when using PPO #50051

Open
ema-pe opened this issue Jan 24, 2025 · 0 comments
Open

[RLlib] params.json is not a valid JSON file when using PPO #50051

ema-pe opened this issue Jan 24, 2025 · 0 comments
Labels
bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@ema-pe
Copy link

ema-pe commented Jan 24, 2025

What happened + What you expected to happen

Bug: When running a training using the PPO algorithm, the params.json file in the experiment directory is not a valid JSON file representing the parameters of the experiment, while the `params.pkl' file is ok because it is the serialization of the PPOConfig object.

As an example, I run an experiment with a simple environment using PPO and in the result directory is the content of the params.json file:

$ ls ~/ray_results/PPO_SimplexTest_2025-01-24_15-26-24fb0ql0hd/
events.out.tfevents.1737728784.dfaas-marl  params.json  params.pkl  progress.csv  result.json
$ cat ~/ray_results/PPO_SimplexTest_2025-01-24_15-26-24fb0ql0hd/params.json
"<ray.rllib.algorithms.ppo.ppo.PPOConfig object at 0x7647680efe30>"

Expected behavior: The `params.json' file should contain a valid JSON representation of the PPOConfig object.

This is a low severity issue because the params.pkl is serialized correctly and Ray RLLib uses this file, not the JSON one, when loading from a checkpoint. I have not tried other algorithms. But anyway, the JSON is the only way to have an interoperable, human-readable, Python-version-independent log of the algorithm configuration.

Versions / Dependencies

Ubuntu 24.04.1 LTS wit Ray 2.40.0.

Reproduction script

Just run this script and then browse the results directory to find the `params.json' file.

import gymnasium as gym

from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.utils.spaces.simplex import Simplex
from ray.tune.registry import register_env


class SimplexTest(gym.Env):
    """SimplexTest is a sample environment that basically does nothing."""

    def __init__(self, config=None):
        self.action_space = Simplex(shape=(3,))
        self.observation_space = gym.spaces.Box(shape=(1,), low=-1, high=1)
        self.max_steps = 100

    def reset(self, seed=None, options=None):
        self.current_step = 0

        obs = self.observation_space.sample()
        return obs, {}

    def step(self, action):
        self.current_step += 1

        obs = self.observation_space.sample()
        reward = self.np_random.random()
        terminated = self.current_step == self.max_steps
        return obs, reward, terminated, False, {}


register_env("SimplexTest", lambda env_config: SimplexTest(config=env_config))


if __name__ == "__main__":
    # Algorithm config.
    ppo_config = (
        PPOConfig()
        # By default RLlib uses the new API stack, but I use the old one.
        .api_stack(
            enable_rl_module_and_learner=False, enable_env_runner_and_connector_v2=False
        )
        .environment(env="SimplexTest")
        .framework("torch")
        .env_runners(num_env_runners=0)  # Get experiences in the main process.
        .evaluation(evaluation_interval=None)  # No automatic evaluation.
        .resources(num_gpus=1)
    )

    # Build the experiment.
    ppo_algo = ppo_config.build()
    print(f"Algorithm initialized ({ppo_algo.logdir = }")

    iterations = 2
    print(f"Start of training ({iterations = })")
    for iteration in range(iterations):
        print(f"Iteration {iteration}")
        ppo_algo.train()
    print("Training terminated")

    ppo_algo.stop()
    print("Training end")

Issue Severity

Low: It annoys or frustrates me.

@ema-pe ema-pe added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

1 participant