Scaling Genetic Programming to Complex Reinforcement Learning Tasks

  • Heywood, Malcolm (PI)

Project: Research project

Project Details

Description

Reinforcement learning (RL) represents a type of task in which an agent interacts with an environment to maximize its long term reward. A lot of progress has recently been made with deep learning under high-dimensional state and action spaces. This means that rather than having to first develop a suite of appropriate input features, sensors such as video can be used directly. An enormous number of applications have benefited from this development, from algorithms that play Go and Chess better than humans, to facilitating new levels of human competitive performance for robot control tasks. However, one drawback of such an approach is that they generally represent complex black box solutions that require hardware support to deploy, even after training. We recently proposed an alternative approach for scaling RL to high-dimensional state spaces using genetic programming. To do so, teams of programs self organize into Tangled Program Graphs (TPG), which represents an approach of organizing teams of programs into graphs. Our initial benchmarking under high-dimensional RL tasks demonstrates that equivalent quality solutions can be discovered, but with multiple orders of magnitude lower complexity. The proposed research program will greatly expand on the TPG approach to efficiently discover solutions to non-reactive RL tasks requiring multiple simultaneous actions per time step. The long term research program is organized around three objectives: 1) Support for the Automatic identification of behavioural subgraphs: provides the basis for task transfer, accelerated training and increased transparency of machine learning solutions. 2) Develop Multiple concurrent memory models: is the basis for scaling TPG to a wide cross section of non-reactive RL tasks. Without this, it would not be possible to scale to partially observable problems, a class of tasks of widespread impact. 3) Support for describing actions as Multi-dimensional spaces: means that decisions involving multiple real and discrete actions per state can be made simultaneously. A capability that also potentially appears in many applications. Successful completion of this research program will result in a TPG framework that provides solution quality complementing those from deep learning. However, TPG constructs solutions by explicitly discovering mechanisms for decomposing the decision making task. This means that solutions are light-weight, executing in real-time without any form of hardware support. The simplicity of solutions will also support insights into attribute support and solution transparency. This is particularly important when attempting to gain knowledge from solutions post training. Success in the proposed research program would demonstrate new models for addressing open ended questions regarding the application and deployment of RL agents to navigation, motor control and strategic decision making in real-time partially observable environments.

StatusActive
Effective start/end date1/1/23 → …

Funding

  • Natural Sciences and Engineering Research Council of Canada: US$21,491.00

ASJC Scopus Subject Areas

  • Genetics
  • Artificial Intelligence
  • Decision Sciences(all)
  • Physics and Astronomy(all)
  • Chemistry(all)
  • Agricultural and Biological Sciences(all)
  • Engineering(all)
  • Management of Technology and Innovation