Collaborative industrial project which uses Deep Reinforcemnt Learning Agent to efficiently allocate tasks in nodes in an adaptive distributed embedded system. Additionally the agent handles critical tasks ensuring fail-safety compliance and optimizes message passing among tasks, solving a NP Hard combinatorial problem in linear time.
Make sure you have latest version of Python 3.9 installed
pip install -r requirements.txt
Navigate to the src
folder and run:
Format
python main.py --config [PATH_CONFIG_1] [PATH_CONFIG_2] [--Param_Header1] [Param_Value1] .. so on
Example
python main.py --train false --model_path ../experiments/models/p1/trnc_c/early_term_1000 --config utils/configs/problem_1.yaml utils/configs/experiment_trnc_c.yaml --experiment_name custom_experiments --run_name first_inference
Note: Each and every parameter in existing configuration files is modifable. It can be changed and treated as a command line argument by putting double dash (--) as prefix.
Note 2: For Act-Replace category (either inference or training), set parameter invalid_action_replacement
as true
in default.yaml
Note 3: For Act-Mask category (either inference or training), replace PPOModel
with MaskablePPOModel
in main.py by importing from Maskable PPO Implementation using from models.maskable_ppo import MaskablePPOModel
python main.py --config utils/configs/problem_1.yaml utils/configs/experiment_tn.yaml --experiment_name custom_experiments --run_name first_train
Note: You may also provide your own custom configuration file
The study defines three problem sets and multiple configuration variants to evaluate the performance of RL agents in a CADES (Configurable Adaptive Distributed Execution System). These scenarios aim to emulate the dynamic and unpredictable conditions of real-world systems.
- Problem 1: A static system configuration with fixed tasks and nodes. This serves as a baseline to evaluate basic performance.
- Problem 2: Introduces variability in task numbers and costs while keeping nodes constant. It reflects fluctuating task demands with stable hardware resources.
- Problem 3: Adds complexity by varying tasks, their costs, and the number of nodes. This scenario includes potential node downtimes, representing real-life challenges with dynamic task demands and resource failures.
Problem No. | Tasks (#) | Task Cost | Nodes (#) | Node Capacity |
---|---|---|---|---|
1 | 12 | 4 | 6 | 12 |
2 | 8 to 10 | 4 to 6 | 6 | 12 |
3 | 8 to 10 | 4 to 6 | 6 to 8 | 10 to 12 |
To capture different scenarios that may arise during the reconfiguration of a CADES, we propose several distinct configuration variants:
- TN: Tasks and nodes are available, but no replicas or communication are required. Represents non-critical, independent task execution scenarios.
- TRN: Adds replicas for critical tasks but no communication. Focuses on fault tolerance for critical tasks.
- TRNC: Includes tasks, nodes, replicas, and communication, divided into:
- TRNC A: Communication among non-critical tasks.
- TRNC B: Communication among critical tasks.
- TRNC C: Combines communication for both critical and non-critical tasks.
Each of these variants captures different levels of complexity, reflecting the diverse operational conditions that a CADES may encounter during reconfiguration.
Category | Tasks (T) | Nodes (N) | Replicas (R) | Communication (C) |
---|---|---|---|---|
TN | ✔ | ✔ | ✘ | ✘ |
TRN | ✔ | ✔ | ✔ | ✘ |
TRNC A | ✔ | ✔ | ✔ | Non-critical tasks only |
TRNC B | ✔ | ✔ | ✔ | Critical tasks only |
TRNC C | ✔ | ✔ | ✔ | Both non-critical and critical tasks |
Different invalid action handling strategies are employed to conduct a comparative study of their effects on different configuration problems. These techniques are referenced in the results section and are summarized as follows:
- Early-Term: This technique stands for Early Termination and applies termination for invalid actions.
- Act-Replace: This technique stands for Action Replacement and applies replacement mechanism for invalid actions.
- Act-Mask: This technique stands for Action Masking and applies logits masking for invalid actions.
These strategies are evaluated to understand their impact on solving different configuration problems effectively.
The combination of problem sets and configuration variants provides a comprehensive framework for evaluating the RL agent's ability to handle dynamic, real-world challenges in a CADES. These scenarios test the agent's fault tolerance, adaptability, and task allocation efficiency under varying levels of complexity.
Problem No. | Strategy | TN | TRN | TRNC A | TRNC B | TRNC C |
---|---|---|---|---|---|---|
1 | Early-Term | 100 | 98 | 93 | 97 | 88 |
Act-Replace | 99 | 97 | 94 | 95 | 88 | |
Act-Mask | 100 | 100 | 59 | 60 | 57 | |
2 | Early-Term | 100 | 96 | 97 | 99 | 91 |
Act-Replace | 97 | 99 | 96 | 92 | 90 | |
Act-Mask | 100 | 99 | 91 | 95 | 96 | |
3 | Early-Term | 94 | 93 | 88 | 86 | 83 |
Act-Replace | 96 | 98 | 90 | 88 | 88 | |
Act-Mask | 91 | 93 | 80 | 86 | 84 |
Note: Bolded values indicate the highest performance in each category (TN, TRN, TRNC A, TRNC B, TRNC C) for that specific Problem Number.
Detailed results can be found in the paper.
- Optimizing deep learning agent to fulfill message passing among tasks more efficiently
- Trying other RL frameworks i.e, Q-learning
- Applying efficient reward shaping strategies