Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add model eval #8

Merged
merged 19 commits into from
Mar 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 0 additions & 26 deletions .dockerignore

This file was deleted.

3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ nohup.out
output/
runs/
wandb/
out/

.DS_Store
.vscode/
Expand All @@ -149,4 +150,4 @@ wandb/

log.txt
pid.txt
data/
data/
6 changes: 0 additions & 6 deletions CITATION.bib

This file was deleted.

41 changes: 41 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
env_name: CartPole-v1 # RL environment name
base_path: /data/evan_gunter/vlmrm/runs/training_cartpole # Base path to save logs and checkpoints
seed: 42 # Seed for reproducibility
description: CartPole training using CLIP reward
tags: # Wandb tags
- training
- cartpole
- CLIP
reward:
name: clip
pretrained_model: ViT-g-14/laion2b_s34b_b88k # CLIP model name
# CLIP batch size per synchronous inference step.
# Batch size must be divisible by n_workers (GPU count)
# so that it can be shared among workers, and must be a divisor
# of n_envs * episode_length so that all batches can be of the
# same size (no support for variable batch size as of now.)
batch_size: 1600
alpha: 0.5 # Alpha value of Baseline CLIP (CO-RELATE)
target_prompts: # Description of the goal state
- pole vertically upright on top of the cart
baseline_prompts: # Description of the environment
- pole and cart
# Path to pre-saved model weights. When executing multiple runs,
# mount a volume to this path to avoid downloading the model
# weights multiple times.
cache_dir: /data/evan_gunter/vlmrm/.cache_clip
rl:
policy_name: MlpPolicy
n_steps: 100000 # Total number of simulation steps to be collected.
n_envs_per_worker: 4 # Number of environments per worker (GPU)
episode_length: 200 # Desired episode length
learning_starts: 100 # Number of env steps to collect before training
train_freq: 200 # Number of collected env steps between training iterations
batch_size: 64 # SAC buffer sample size per gradient step
gradient_steps: 1 # Number of samples to collect from the buffer per training step
tau: 0.005 # SAC target network update rate
gamma: 0.99 # SAC discount factor
learning_rate: 3e-4 # SAC optimizer learning rate
logging:
checkpoint_freq: 800 # Number of env steps between checkpoints
video_freq: 800 # Number of env steps between videos
42 changes: 42 additions & 0 deletions config_obstacle_course.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
env_name: ObstacleCourse-v0 # RL environment name
base_path: /data/evan_gunter/vlmrm/data/runs/training_obstacle_course # Base path to save logs and checkpoints
seed: 42 # Seed for reproducibility
description: ObstacleCourse training using CLIP reward
tags: # Wandb tags
- training
- carracing
- CLIP
reward:
name: clip
pretrained_model: ViT-g-14/laion2b_s34b_b88k # CLIP model name
# CLIP batch size per synchronous inference step.
# Batch size must be divisible by n_workers (GPU count)
# so that it can be shared among workers, and must be a divisor
# of n_envs * episode_length so that all batches can be of the
# same size (no support for variable batch size as of now.)
batch_size: 1600
alpha: 0.5 # Alpha value of Baseline CLIP (CO-RELATE)
target_prompts: # Description of the goal state
- car viewed from above in the middle of a gray road, not on a pink square
baseline_prompts: # Description of the environment
- car viewed from above
# Path to pre-saved model weights. When executing multiple runs,
# mount a volume to this path to avoid downloading the model
# weights multiple times.
cache_dir: /data/evan_gunter/vlmrm/data/.cache_clip
rl:
policy_name: MlpPolicy
n_steps: 100000 # Total number of simulation steps to be collected.
n_envs_per_worker: 4 # TODO: revert when running on more GPUs # 2 # Number of environments per worker (GPU)
episode_length: 200 # Desired episode length
learning_starts: 100 # Number of env steps to collect before training
train_freq: 200 # Number of collected env steps between training iterations
batch_size: 64 # SAC buffer sample size per gradient step
gradient_steps: 1 # Number of samples to collect from the buffer per training step
tau: 0.005 # SAC target network update rate
gamma: 0.99 # SAC discount factor
learning_rate: 3e-4 # SAC optimizer learning rate
logging:
checkpoint_freq: 800 # Number of env steps between checkpoints
video_freq: 800 # Number of env steps between videos
# tensorboard_freq: 800 # Number of env steps between tensorboard logs
56 changes: 0 additions & 56 deletions docker/Dockerfile

This file was deleted.

Loading