Skip to content

Commit

Permalink
Added project files
Browse files Browse the repository at this point in the history
  • Loading branch information
Rajarshi1001 committed Apr 21, 2024
1 parent 43f5f70 commit 802ef30
Show file tree
Hide file tree
Showing 15 changed files with 176,655 additions and 222 deletions.
Binary file added CS780-Project-Final-Presentation-4.pdf
Binary file not shown.
Binary file added CS780-Project-Final-Report-4.pdf
Binary file not shown.
29 changes: 25 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# CS780 Project: Safe Exploration in Continuous Action Spaces
# CS780 Project 4: Safe Exploration in Continuous Action Spaces

The given repository contains the runs for the safe exploration experiments mentioned in the original paper. The paper essentially talks about a novel architecture employed for solving real world problems where violating safety or critical contraints are heavily penalized.

Expand All @@ -10,11 +10,32 @@ There are two environments proposed in the paper for reference, namely the `Ball
![Safety Layer Diagram](assets/safety_layer_diagram.png)


### Implementations and Experimentations

Our experiments include designing the Safety
Layer from scratch and integrating it with __DDPG__
and __Twin Delayed Deep Deterministic model
(TD3)__ on various gym environments including `Ball-1D`, `Ball-2D`, `Ball-3D`, `Spaceship-Arena`, `Spaceship-Corridor`, `Bioreactor`. The __TD3__ algorithm is an improvement over __DDPG__ that avoids
the maximization bias by introducing joint backpropagation of twin critics. Our experiments also includes rewards and cumulative constraint violations for each of the environments with customized
reward shaping. We have essentially performed a comparative analysis depicting how a minimal safety layer implementation over the deterministic policy model effectively boosts up the training and evaluation rewards obtained by the agent while navigating in the respective environment over episodes and is nearly successful in attaining constraints free actions.

The plots obtained using the safety layer for different environments highlights that the agent is able to attain optimal convergence in terms of rewards in way lesser episodes. The action correction also comes at the cost of increased wall clock time since
on every action selection, a forward pass through the trained constraint model is executed to return the safe actions for navigation in the environment. The implementation also guarantees 0 constraints in some of the environments, thus highlighting the potential of a linear safety approximation in several
industrial use cases.

All of the results are compiled in the form `.npy`
files inside the [files link](https://drive.google.com/drive/folders/1se0HGsBH06XXP2wex8Xb_PkeqJi4pAr9). The link to the script for visualizing the results obtained for all the above mentioned environments is [Link](https://drive.google.com/drive/folders/1se0HGsBH06XXP2wex8Xb_PkeqJi4pAr9). Some visualizations and comparisons can be found in
[Link](https://drive.google.com/drive/folders/1gF_vI_uZAj0ecLslkLzX1Wou9WhnazSP)

## Project based resources

- [Intro PPT](https://github.com/Rajarshi1001/CS780_Project/blob/master/CS780-Project-Initial-Presentation-4.pdf)
- [Mid-Term Report](https://github.com/Rajarshi1001/CS780_Project/blob/master/CS780-Project-Initial-Report-4.pdf)

- [Initial Presentation](https://github.com/Rajarshi1001/CS780_Project/blob/master/CS780-Project-Initial-Presentation-4.pdf)
- [End-Term Presentation](https://github.com/Rajarshi1001/CS780_Project/blob/master/CS780-Project-Final-Presentation-4.pdf)
- [Mid-Term Report](https://github.com/Rajarshi1001/CS780_Project/blob/master/CS780-Project-Final-Report-4.pdf)
- [Final Report](https://github.com/Rajarshi1001/CS780_Project/blob/master/CS780-Project-Initial-Report-4.pdf)


All the working implementations are found inside the `./notebooks` directory

###

Expand Down
100 changes: 0 additions & 100 deletions ddpg_safety_layer_100_epochs.csv

This file was deleted.

70 changes: 0 additions & 70 deletions ddpg_safety_layer_70_epochs.csv

This file was deleted.

Loading

0 comments on commit 802ef30

Please sign in to comment.