-
Notifications
You must be signed in to change notification settings - Fork 210
Description
I am using PPORecurrent
with the RecurrentDictRolloutBuffer
. In #103 it's mentioned that batch size is intended to be constant size. However this seems not to be the case.
I did some experiments where I print the action batch sizes coming from the _get_samples()
function.
Exp 1: batch_size < sequence length
Initialize PPORecurrent with batch_size = 10. I am sampling only 2000 steps for debugging purposes.
The mean sequence length is 32.3
It looks like whenever n_seq = 2 and it appends 2 different sequences the batch size is higher than the specified 10.
Exp 2: batch_size > sequence length
Initialize PPORecurrent with batch_size = 100
The mean sequence length this time is 31.7
The batch size is now always higher than the specified 100, and different every time.
Is this the intended behavior? It seems wrong to me since it is stated explicitly in the aforementioned issue that batch size is intended to be constant:
Actually no, the main reason is that you want to keep a mini batch size constant (otherwise you will need to adjust the learning rate for instance).
Also, now that we are at it, what is the reason for implementing mini batching, instead of feeding batches of whole sequences to the model?
System Info
Describe the characteristic of your environment:
- installed with pip
- Stable-Baselines3 1.6.0
- sb3-contrib 1.6.0
- GPU models and configuration
- Python 3.8.0
- PyTorch 1.12.1+cu116
- Gym 0.21.0