Detalles del proyecto
Description
The general context for this research proposal is that of genetic programming (GP) as applied to learning decision making policies for agents operating in environments with delayed payoff (or reinforcement learning). The specific focus of the proposal lies in developing a framework for systematically scaling GP to more difficult versions of tasks with delayed payoff than have previously been considered. In particular we are interested in scenarios in which solutions for a simpler initial `source' task are then `transferred' to a more difficult but related (target) task; or a form of transfer learning. The insight of this work is to make use of the capability of GP to identify solutions that make use of subsets of state variables. This then provides the basis for redeploying solutions discovered under the source task such that more difficult tasks can be solved. The potential benefits of adopting such an approach are that: 1) it is not necessary to continuously rediscover policies from scratch; 2) increased success in / or better solutions to the ultimate target task; and 3) lower computational overhead as measured against finding solutions to each task.*Two specific target domains will be used to illustrate the approach: 1) learning policies to play soccer in the continuous valued simulated 2D world of RoboCup keepaway; 2) learning general policies for solving the 3 by 3 Rubik cube. The keepway soccer task represents a benchmark for multi-agent learning, hence has a history of previous results as well as posing tasks of incrementally increasing difficulty. The Rubik cube task has had little previous history as a benchmark for learning algorithms of any form. Instead solutions have taken the form of deploying some form of exhaustive search. Both tasks represent examples of complex task domains that have very large state-spaces (potential number of legal states), but possess underlying properties (regularities) that GP should be able to discover specific instances of. The basic hypothesis of this research is that once an instance of a context dependent strategy is identified for solving some subset of an initial task, then we should be able to use this as the basis for generalizing to many more instances of the task through a deterministic process of variation in the policy's references to the state variables. Naturally, for the process to scale, we need to avoid artificially introducing pathologies into the search. In short, the sum of initial policies from which a later policy is constructed needs to exceed the mere sum of its parts.*Success in these objectives would provide a general framework for scaling GP to a wide range of tasks with delayed payoff. Such tasks are of widespread interest to the GP community because they represent some of the most expensive, if not the most expensive, set of task domains for applying GP to. Moreover, the two task domains are widely acknowledged to be of a particularly challenging nature.*
Estado | Activo |
---|---|
Fecha de inicio/Fecha fin | 1/1/18 → … |
Financiación
- Natural Sciences and Engineering Research Council of Canada: US$ 13.892,00
ASJC Scopus Subject Areas
- Genetics
- Computer Science(all)