Meta Proximal Policy Optimization for Cooperative Multi-Agent Continuous Control
Loading...
Files
Can’t use the file because of accessibility barriers? Contact us
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Permanent Link
Abstract
In this thesis we propose Multi-Agent Proxy Proximal Policy Optimization (MA3PO), a novel multi-agent deep reinforcement learning algorithm that tackles the challenge of cooperative continuous multi-agent control. Our method is driven by the observation that most existing multi-agent reinforcement learning algorithms mainly focus on discrete state/action spaces and are thus computationally infeasible when extended to environments with continuous state/action spaces.
To address the issue of computational complexity and to better model intra-agent collaboration, we make use of the recently successful Proximal Policy Optimization algorithm that effectively explores of continuous action spaces, and incorporate the notion of emph{intrinsic motivation} via emph{meta-gradient methods} so as to stimulate the behavior of individual agents in cooperative multi-agent settings. Towards these ends, we design proxy rewards to quantify the effect of individual agent-level intrinsic motivation onto the team-level reward, and apply meta-gradient methods to leverage such an addition with a learning-to-learning optimization paradigm so that our algorithm can learn the team-level cumulative reward effectively.
Furthermore, we have also conducted experiments on various open multi-agent reinforcement learning benchmark environments with continuous action spaces. Our results demonstrate that our meta proximal policy optimization algorithm is not only comparable with other existing state-of-the-art algorithmic benchmarks in terms of performances, but also significantly reduces training time complexity as compared to existing techniques.
Series and Number:
Indiana University Computer Science Technical Reports; TR745
EducationalLevel:
Is Based On:
Target Name:
Teaches:
Table of Contents
Description
Keywords
Citation
Journal
DOI
Link(s) to data and video for this item
Relation
Rights
This work is protected by copyright unless stated otherwise.