Meta Proximal Policy Optimization for Cooperative Multi-Agent Continuous Control

dc.contributor.authorFang, Boli
dc.date.accessioned2025-11-13T22:24:59Z
dc.date.available2025-11-13T22:24:59Z
dc.date.issued2022-05
dc.description.abstractIn this thesis we propose Multi-Agent Proxy Proximal Policy Optimization (MA3PO), a novel multi-agent deep reinforcement learning algorithm that tackles the challenge of cooperative continuous multi-agent control. Our method is driven by the observation that most existing multi-agent reinforcement learning algorithms mainly focus on discrete state/action spaces and are thus computationally infeasible when extended to environments with continuous state/action spaces. To address the issue of computational complexity and to better model intra-agent collaboration, we make use of the recently successful Proximal Policy Optimization algorithm that effectively explores of continuous action spaces, and incorporate the notion of emph{intrinsic motivation} via emph{meta-gradient methods} so as to stimulate the behavior of individual agents in cooperative multi-agent settings. Towards these ends, we design proxy rewards to quantify the effect of individual agent-level intrinsic motivation onto the team-level reward, and apply meta-gradient methods to leverage such an addition with a learning-to-learning optimization paradigm so that our algorithm can learn the team-level cumulative reward effectively. Furthermore, we have also conducted experiments on various open multi-agent reinforcement learning benchmark environments with continuous action spaces. Our results demonstrate that our meta proximal policy optimization algorithm is not only comparable with other existing state-of-the-art algorithmic benchmarks in terms of performances, but also significantly reduces training time complexity as compared to existing techniques.
dc.identifier.urihttps://hdl.handle.net/2022/34583
dc.relation.ispartofseriesIndiana University Computer Science Technical Reports; TR745
dc.rightsThis work is protected by copyright unless stated otherwise.
dc.rights.uri
dc.titleMeta Proximal Policy Optimization for Cooperative Multi-Agent Continuous Control

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TR745.pdf
Size:
1.12 MB
Format:
Adobe Portable Document Format
Can’t use the file because of accessibility barriers? Contact us