This project implements a RL agent for doing Dynamic Channel Allocation in a simulated mobile caller environment.
The implementation is in Haskell and uses Accelerate for numerical work. It is a near-complete port of the best-performing agent (AA-VNet) from https://github.com/tsoernes/dca. The agent uses a linear neural network as state value function approximator, which is trained using a newly proposed average-reward variant of TDC gradients, originally defined for discounted returns in Sutton et al. 2009: "Fast gradient-descent methods for temporal-difference learning with linear function approximation."
For an introduction to the channel allocation problem and how RL is applied to solving it, see: Torstein Sørnes 2018: Contributions to centralized dynamic channel allocation reinforcement learning agents
See also the version written in Rust and Python.
The following builds with O2 and other optimizations.
stack build --stack-yaml stack-release.yaml
To build without optimizations but with profiling flags, drop the --stack-yaml .. option.
stack exec --stack-yaml stack-release.yaml dca-exe -- --backend cpu
Which will run the project, and on startup generate a full computational graph which
contains both the call network simulator and the agent's neural network.
The computational graph is compiled using Accelerate.LLVM.Native, and executed
on the CPU. To use Accelerate's build-in interpreter instead, skip the --backend cpu flag.
Support for compiling to GPU can be obtained by adding the dependency
accelerate-llvm-ptx and switching out the imports in AccUtils.hs.
To see available options, run:
stack exec --stack-yaml stack-release.yaml dca-exe -- --help
Available options:
--call_dur MINUTES Call duration for new calls. (default: 3.0)
--call_dur_hoff MINUTES Call duration for handed-off calls. (default: 1.0)
--call_rate PER_HOUR Call arrival rate (new calls). (default: 200.0)
--hoff_prob PROBABILITY Hand-off probability. Set to 0 to disable
hand-offs. (default: 0.0)
--n_events N Simulation duration, in number of processed
events. (default: 10000)
--log_iter N How often to show run time statistics such as call
blocking probability. (default: 1000)
--learning_rate F For neural net, i.e. state value
update. (default: 2.52e-6)
--learning_rate_avg F Learning rate for the average reward
estimate. (default: 6.0e-2)
--learning_rate_grad F Learning rate for gradient
correction. (default: 5.0e-6)
--backend ARG Accepted backends are 'interp' for 'Interpreter' and
'cpu' for 'LLVM.Native'.The interpreter yields better
error messages. (default: Interpreter)
--min_loss F Abort simulation if loss goes below given absolute
value. Set to 0 to disable. (default: 0.0)
--fixed_rng Use a fixed (at 0) seed for the RNG. If this switch
is not enabled, the seed is selected at random.
-h,--help Show this help text
- Implement hand-off look-ahead