feat(example): add rl-training example #31

hellomypastor · 2025-12-23T08:52:12Z

Summary

Added a new examples/rl-training example demonstrating RL training (CartPole + DQN) inside OpenSandbox, including dependency installation, training, checkpointing, and summary output. Also updated
the examples index and aligned dependencies with the default opensandbox/code-interpreter:latest image. Resolves Add Reinforcement Learning (RL) Sandbox Example #29.

Testing

Not run (example is environment-dependent; requires running OpenSandbox server and sandbox image)
Unit tests
Integration tests
e2e / manual verification

Breaking Changes

None
Yes (describe impact and migration path)

Checklist

Linked Issue or clearly described motivation
Added/updated docs (if needed)
Added/updated tests (if needed)
Security impact considered
Backward compatibility considered

jwx0925 · 2025-12-25T01:16:49Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-25T01:19:20Z

examples/rl-training/main.py

+        if not await _ensure_pip(sandbox):
+            print("Failed to bootstrap pip inside the sandbox.")
+            return


Early returns leave sandbox running on failure paths

If pip bootstrap, dependency installation, or training fails, the function returns from inside the async with sandbox block before reaching the kill call at the end, while Sandbox.__aexit__ only closes client resources and does not terminate the remote instance. This leaves an orphaned sandbox running until an external timeout, consuming resources in failure scenarios; wrapping the block in a try/finally that always calls sandbox.kill() would avoid leaks.

Useful? React with 👍 / 👎.

feat(example): add rl-training example

55f58d5

hellomypastor requested review from Pangjiping, hittyt, jwx0925 and ninan-nn as code owners December 23, 2025 08:52

chatgpt-codex-connector bot reviewed Dec 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(example): add rl-training example #31

feat(example): add rl-training example #31

Uh oh!

hellomypastor commented Dec 23, 2025

Uh oh!

jwx0925 commented Dec 25, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(example): add rl-training example #31

Are you sure you want to change the base?

feat(example): add rl-training example #31

Uh oh!

Conversation

hellomypastor commented Dec 23, 2025

Summary

Testing

Breaking Changes

Checklist

Uh oh!

jwx0925 commented Dec 25, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants