Added project files

Rajarshi1001 · Apr 21, 2024 · 802ef30 · 802ef30
1 parent 43f5f70
commit 802ef30
Show file tree

Hide file tree

Showing 15 changed files with 176,655 additions and 222 deletions.
diff --git a/CS780-Project-Final-Presentation-4.pdf b/CS780-Project-Final-Presentation-4.pdf
diff --git a/CS780-Project-Final-Report-4.pdf b/CS780-Project-Final-Report-4.pdf
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# CS780 Project: Safe Exploration in Continuous Action Spaces
+# CS780 Project 4: Safe Exploration in Continuous Action Spaces
 
 The given repository contains the runs for the safe exploration experiments mentioned in the original paper. The paper essentially talks about a novel architecture employed for solving real world problems where violating safety or critical contraints are heavily penalized. 
 
@@ -10,11 +10,32 @@ There are two environments proposed in the paper for reference, namely the `Ball
 ![Safety Layer Diagram](assets/safety_layer_diagram.png)
 
 
+### Implementations and Experimentations
+
+Our experiments include designing the Safety
+Layer from scratch and integrating it with __DDPG__
+and __Twin Delayed Deep Deterministic model
+(TD3)__ on various gym environments including `Ball-1D`, `Ball-2D`, `Ball-3D`, `Spaceship-Arena`, `Spaceship-Corridor`, `Bioreactor`. The __TD3__ algorithm is an improvement over __DDPG__ that avoids
+the maximization bias by introducing joint backpropagation of twin critics. Our experiments also includes rewards and cumulative constraint violations for each of the environments with customized
+reward shaping. We have essentially performed a comparative analysis depicting how a minimal safety layer implementation over the deterministic policy model effectively boosts up the training and evaluation rewards obtained by the agent while navigating in the respective environment over episodes and is nearly successful in attaining constraints free actions.
+
+The plots obtained using the safety layer for different environments highlights that the agent is able to attain optimal convergence in terms of rewards in way lesser episodes. The action correction also comes at the cost of increased wall clock time since
+on every action selection, a forward pass through the trained constraint model is executed to return the safe actions for navigation in the environment. The implementation also guarantees 0 constraints in some of the environments, thus highlighting the potential of a linear safety approximation in several
+industrial use cases.
+
+All of the results are compiled in the form `.npy`
+files inside the [files link](https://drive.google.com/drive/folders/1se0HGsBH06XXP2wex8Xb_PkeqJi4pAr9). The link to the script for visualizing the results obtained for all the above mentioned environments is [Link](https://drive.google.com/drive/folders/1se0HGsBH06XXP2wex8Xb_PkeqJi4pAr9). Some visualizations and comparisons can be found in
+[Link](https://drive.google.com/drive/folders/1gF_vI_uZAj0ecLslkLzX1Wou9WhnazSP)
+
 ## Project based resources
 
-- [Intro PPT](https://github.com/Rajarshi1001/CS780_Project/blob/master/CS780-Project-Initial-Presentation-4.pdf)
-- [Mid-Term Report](https://github.com/Rajarshi1001/CS780_Project/blob/master/CS780-Project-Initial-Report-4.pdf)
-
+- [Initial Presentation](https://github.com/Rajarshi1001/CS780_Project/blob/master/CS780-Project-Initial-Presentation-4.pdf)
+- [End-Term Presentation](https://github.com/Rajarshi1001/CS780_Project/blob/master/CS780-Project-Final-Presentation-4.pdf) 
+- [Mid-Term Report](https://github.com/Rajarshi1001/CS780_Project/blob/master/CS780-Project-Final-Report-4.pdf)
+- [Final Report](https://github.com/Rajarshi1001/CS780_Project/blob/master/CS780-Project-Initial-Report-4.pdf)
+
+
+All the working implementations are found inside the `./notebooks` directory
 
 ###
 

diff --git a/ddpg_safety_layer_100_epochs.csv b/ddpg_safety_layer_100_epochs.csv
diff --git a/ddpg_safety_layer_70_epochs.csv b/ddpg_safety_layer_70_epochs.csv