Skip to content

Reward for opening a door is given before door is opened. #27

@mortido

Description

@mortido

It looks like reward for opening 'green' door for the first time is 0.1 points but it is given a few frames before door is actually opened on observed scene... Maybe it is related to some delay for showing opened door (as I tried to go backwards and door still became opened after some frames).

Here is the code that shows it.

from obstacle_tower_env import ObstacleTowerEnv
import numpy as np
import cv2

%matplotlib inline
from matplotlib import pyplot as plt

env = ObstacleTowerEnv('./ObstacleTower/obstacletower', retro=False)
moves = [
[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 2, 0 ,0],[1, 2, 0 ,0],
[1, 2, 0 ,0],[1, 2, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 1, 0 ,0],[1, 0, 0 ,0],
[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],
[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[0, 2, 0 ,0],[0, 2, 0 ,0],[2, 0, 0 ,0],
[2, 0, 0 ,0],[2, 0, 0 ,0],[2, 0, 0 ,0],[2, 0, 0 ,0],[2, 0, 0 ,0],
]
env.seed(0)
env.floor(1)
obs = env.reset()

for i, move in enumerate(moves):    
    obs, reward, done, info = env.step(move)
    print(i, reward)
    if i > 18:
        plt.imshow(obs[0])
        plt.show()

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions