Nancy Owens Fulda and Professor Todd Peterson, Computer Science
Reinforcement learning is a sub-discipline of machine learning in which an autonomous program, called an agent, learns to behave appropriately in its environment. Appropriate behavior is described in terms of numerical reinforcements which the agent receives for appropriate or inappropriate actions. By storing a running average of the reinforcements received for given actions in given situations, the agent learns which behaviors are most desirable.
One major difficulty for reinforcement learning systems is information overload, commonly referred to as the “curse of dimensionality.” As the amount of perceptual input and action opportunities increases, the agent is overwhelmed by an exponentially increasing number of state-action pairs: the problem quickly becomes so complex that the agent can no longer learn an optimal mapping from states to actions in a reasonable amount of time.
One way to combat the curse of dimensionality is to use a “divide and conquer” strategy. Instead of training only a single reinforcement learning agent on the complex task, the researcher decomposes the agent into a group of simpler agents. Acting in unison, these agents can perform the same task as the original agent, but because the sub-task to be learned by each member of the decomposed system is simpler, the effects of the curse of dimensionality are reduced.
My research centered on analyzing the effectiveness of such decompositions, and the means by which effective decompositions could be chosen. I began by studying the different ways in which a single agent could be decomposed into a set of functionally equivalent agents, and by determining a set of rules which relate the new multi-agent system to the original agent. I also ran several empirical experiments with single reinforcement agents and corresponding multiagent systems to determine the impact of agent decomposition on learning speed and effectiveness.
I found that although agent decomposition can effectively combat the curse of dimensionality, it also brings with it several problems unique to multi-agent systems. The agents in the decomposed system often fail to coordinate their actions effectively—thus, although each agent has a simpler problem to solve, the individual solutions are not always useful in combination. The result is like two musicians playing a piano duet, but without paying attention to each others’ tempos. The two parts might happen to coincide rhythmically, but they are much more likely to be chaotic and out-of-sync.
Fortunately, not every decomposition is subject to this coordination difficulty. Some decompositions exist in which the agents’ actions do not interfere with or depend upon each others’ behavior; a situation which I loosely term a dominant-strategy decomposition. It is provable that at least one such decomposition exists for every reinforcement learning task.
Sometimes a dominant-strategy decomposition can be identified at design time through a simple, common-sense analysis of the problem. Often, however, the designer cannot plan an effective decomposition until the entire structure of the task has been learned by the original agent. Thus, by the time a dominant-strategy decomposition has been identified, the task has already been learned! As a result, the usefulness of such decompositions is limited.
One alternative to trying to find a dominant-strategy decomposition is to take any given decomposition and attempt to transform it into a dominant-strategy decomposition by providing communication between the agents at critical junctures. This approach maintains many of the benefits of agent decomposition in combating the curse of dimensionality, while neatly side-stepping the coordination problems often associated with multi-agent systems. In many cases, a very small amount of communication is sufficient to enable a multi-agent system to solve the task effectively.
In an extension of my previous work, I am currently designing an algorithm for dynamic communication in multi-agent systems. This would result in a system in which the agents themselves identify which communication links are useful in the current task, and which are not. Ideally, the agents will find an effective middle ground between information overload and information deprivation by selecting precisely the right amount of relevant information.
Preliminary experiments with this new algorithm show promise. In very simple environments, the algorithm can be shown to effectively discern relevant perceptual information from irrelevant “noise,” even when the relevance of the extra information is minimal. Further work in this direction would extend my preliminary studies to more complex environments, with dozens of simultaneously-learning agents.
This research has been a valuable learning experience for me. I have especially benefited from the chance to conduct my own research and then study papers from related conferences and journals to understand how my work relates to that of other researchers. The process has widened my understanding of the relevant issues in my field, as well as helped me to find my own research niche. I feel that I can push forward with more confidence than before, because I have a clear sense of how my work fits into the “Big Picture.”