Sutton & Barto Gridworld example in C#
Lately, I've been exploring various examples from Sutton and Barto's "Reinforcement Learning: An Introduction" book using C# and I already shared a few of them on this blog:
- Tic-tac-toe reinforcement learning with C#
- Ten armed testbed for the Bandit problem with C#
- Multi-armed bandit exercise 2.5 with C#
Today I'll be focusing on the gridworld example from chapter 3 of the book. The code is available in the existing repo as a new project. Gridworld is a simple example used to illustrate the Bellman equations and iterative policy evaluation. An excerpt from the book describes the environment:
The cells of the grid correspond to the states of the environment. At each cell, four actions are possible: north, south, east, and west, which deterministically cause the agent to move one cell in the respective direction on the grid. Actions that would take the agent off the grid leave its location unchanged, but also result in a reward of -1. Other actions result in a reward of 0, except those that move the agent out of the special states A and B. From state A, all four actions yield a reward of +10 and take the agent to A'. From state B, all actions yield a reward of +5 and take the agent to B'.
— Sutton & Barto, Reinforcement Learning: An Introduction, 2nd ed., Chapter 3.