Parallel Reinforcement Learning

David White & Daniel Kudenko

The reinforcement learning problem is that of an agent situated in an environment learning how to reach a goal. The agent performs an action on the environment at each time step and receives a reward for doing so. Using these rewards it estimates how valuable a state is according to the expected cumulative reward it would receive if it were in that state. As the number of actions it performs on the environment increases, the state valuations become increasingly more accurate and eventually converge to the optimal values. For complex domains, the values can converge very slowly.

This project considerers parallelization methods to speed up the convergence process by allowing many single agents to learn the same problem independently. The agents share the knowledge they have gained (the value estimates) at periodic intervals. To measure the success of the algorithm, it is evaluated on the mountain car problem. In this problem, a car situated in a steep sided valley must learn to escape by reaching the goal at the top of one side of the valley. Both the convergence speed and scalability of the parallel algorithms are evaluated. Considerable improvements are observed from the parallelization scheme, and even when the algorithm is used sequentially, it is possible to obtain faster learning in some situations.