Algorithms, Design, Code and more: Reinforcement Learning

Friday, December 19, 2025

Reinforcement Learning

An important ML training paradigm is Reinforcement Learning (RL). RL models rely on a reward value generated at the end of each training run/ epoch to update the parameters (weights) of the model. This is different from the other ML methods such as Supervised Learning where labelled data/ examples are given from which the models learns. It's is also different from the Unsupervised Learning approach where inherent features of the unlabeled data are explored by the model through the learning phase to identify clusters, etc.

The keras-io examples has some RL implementations such as actor_critic, ppo, etc. All of them work solely with the TensorFlow (tf) backend. In keras_io_examples_rl these have been ported to the Torch/ PyTorch backend. The typical changes include:

Torch Imports
Use torch specific Optimizer - torch.optim.Adam

deep_q_network_breakout_pytorch () requires grad_clipping, in torch done before optimizer.step()

Gradient computations in torch

Replace tf GradientTape with torch autograd
Disable gradient globally torch.set_grad_enabled(False)
Enable autograd within specific flows/ methods where needed
Call loss.backward(), optimizer.step() for backpropagation

Few torch specific tensor & function changes/ wrappers

The ported pytorch compatible files are:

References

http://www.derongliu.org/adp/adp-cdrom/Barto1983.pdf
https://hal.inria.fr/hal-00840470/document
https://link.springer.com/content/pdf/10.1007/BF00992698.pdf
https://www.semanticscholar.org/paper/Human-level-control-through-deep-reinforcement-Mnih-Kavukcuoglu/340f48901f72278f6bf78a04ee5b01df208cc508
Continuous control with deep reinforcement learning: https://arxiv.org/abs/1509.02971)
Deep Deterministic Policy Gradient (DDPG)
https://gymnasium.farama.org/
Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto

Algorithms, Design, Code and more

Friday, December 19, 2025

Reinforcement Learning

No comments:

Post a Comment