Skip to content

Deep reinforcement learning agent for task allocation in adaptive distributed embedded systems facilitating critical tasks and message passing

License

Notifications You must be signed in to change notification settings

akbaig/Allocation-in-Fault-Tolerant-Adaptive-Distributed-Embedded-Systems-using-RL

Repository files navigation

Introduction

Collaborative industrial project which uses Deep Reinforcemnt Learning Agent to efficiently allocate tasks in nodes in an adaptive distributed embedded system. Additionally the agent handles critical tasks ensuring fail-safety compliance and optimizes message passing among tasks, solving a NP Hard combinatorial problem in linear time.

RL Flow Diagram

Installation

Make sure you have latest version of Python 3.9 installed

pip install -r requirements.txt

Inference

Navigate to the src folder and run:

Format

python main.py --config [PATH_CONFIG_1] [PATH_CONFIG_2] [--Param_Header1] [Param_Value1] .. so on

Example

python main.py --train false --model_path ../experiments/models/p1/trnc_c/early_term_1000 --config utils/configs/problem_1.yaml utils/configs/experiment_trnc_c.yaml --experiment_name custom_experiments --run_name first_inference

Note: Each and every parameter in existing configuration files is modifable. It can be changed and treated as a command line argument by putting double dash (--) as prefix.

Note 2: For Act-Replace category (either inference or training), set parameter invalid_action_replacement as true in default.yaml

Note 3: For Act-Mask category (either inference or training), replace PPOModel with MaskablePPOModel in main.py by importing from Maskable PPO Implementation using from models.maskable_ppo import MaskablePPOModel

Training

python main.py --config utils/configs/problem_1.yaml utils/configs/experiment_tn.yaml --experiment_name custom_experiments --run_name first_train

Note: You may also provide your own custom configuration file

Experiments

Problem Sets and Configuration Variants

The study defines three problem sets and multiple configuration variants to evaluate the performance of RL agents in a CADES (Configurable Adaptive Distributed Execution System). These scenarios aim to emulate the dynamic and unpredictable conditions of real-world systems.

Problem Sets

  1. Problem 1: A static system configuration with fixed tasks and nodes. This serves as a baseline to evaluate basic performance.
  2. Problem 2: Introduces variability in task numbers and costs while keeping nodes constant. It reflects fluctuating task demands with stable hardware resources.
  3. Problem 3: Adds complexity by varying tasks, their costs, and the number of nodes. This scenario includes potential node downtimes, representing real-life challenges with dynamic task demands and resource failures.
Problem No. Tasks (#) Task Cost Nodes (#) Node Capacity
1 12 4 6 12
2 8 to 10 4 to 6 6 12
3 8 to 10 4 to 6 6 to 8 10 to 12

Configuration Variants

To capture different scenarios that may arise during the reconfiguration of a CADES, we propose several distinct configuration variants:

  1. TN: Tasks and nodes are available, but no replicas or communication are required. Represents non-critical, independent task execution scenarios.
  2. TRN: Adds replicas for critical tasks but no communication. Focuses on fault tolerance for critical tasks.
  3. TRNC: Includes tasks, nodes, replicas, and communication, divided into:
    • TRNC A: Communication among non-critical tasks.
    • TRNC B: Communication among critical tasks.
    • TRNC C: Combines communication for both critical and non-critical tasks.

Each of these variants captures different levels of complexity, reflecting the diverse operational conditions that a CADES may encounter during reconfiguration.

Category Tasks (T) Nodes (N) Replicas (R) Communication (C)
TN
TRN
TRNC A Non-critical tasks only
TRNC B Critical tasks only
TRNC C Both non-critical and critical tasks

Invalid Action Strategies

Different invalid action handling strategies are employed to conduct a comparative study of their effects on different configuration problems. These techniques are referenced in the results section and are summarized as follows:

  1. Early-Term: This technique stands for Early Termination and applies termination for invalid actions.
  2. Act-Replace: This technique stands for Action Replacement and applies replacement mechanism for invalid actions.
  3. Act-Mask: This technique stands for Action Masking and applies logits masking for invalid actions.

These strategies are evaluated to understand their impact on solving different configuration problems effectively.

Summary

The combination of problem sets and configuration variants provides a comprehensive framework for evaluating the RL agent's ability to handle dynamic, real-world challenges in a CADES. These scenarios test the agent's fault tolerance, adaptability, and task allocation efficiency under varying levels of complexity.

Results

Success Rate (%)

Problem No. Strategy TN TRN TRNC A TRNC B TRNC C
1 Early-Term 100 98 93 97 88
Act-Replace 99 97 94 95 88
Act-Mask 100 100 59 60 57
2 Early-Term 100 96 97 99 91
Act-Replace 97 99 96 92 90
Act-Mask 100 99 91 95 96
3 Early-Term 94 93 88 86 83
Act-Replace 96 98 90 88 88
Act-Mask 91 93 80 86 84

Note: Bolded values indicate the highest performance in each category (TN, TRN, TRNC A, TRNC B, TRNC C) for that specific Problem Number.

Detailed results can be found in the paper.

Future Work

  • Optimizing deep learning agent to fulfill message passing among tasks more efficiently
  • Trying other RL frameworks i.e, Q-learning
  • Applying efficient reward shaping strategies

About

Deep reinforcement learning agent for task allocation in adaptive distributed embedded systems facilitating critical tasks and message passing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published