Introduction
With the emergence of embodied multi-agent systems in robotics, enabling collaboration among heterogeneous agents—such as humanoids, quadrupeds, and manipulators—has become a fundamental step toward achieving general-purpose, real-world autonomy. These agents must not only coordinate high-level plans but also execute them robustly in complex physical environments.
However, bridging the gap between symbolic planning and continuous control across multiple bodies poses significant challenges, especially when agents possess diverse capabilities and partial observations.
This challenge aims to promote research on this topic through two distinct yet complementary tracks. By decoupling planning and control, we aim to explore the planning capabilities of vision-language models (VLMs), while also investigating the potential of end-to-end models for coordinated control across multiple robotic arms.
If you are interested in this challenge, please fill out the form below as registration. We will release more information and provide updates about the competition soon.
Registration! Discord Join WeChat GroupScan to Join Our WeChat Group 👇

Timeline
August 18, 2025
Warm-up Round Starts
September 1, 2025
Official Round Starts
October 31, 2025
Official Round Ends
December 2025
Result Announcement
Track 1: Multi-Agent Embodied Planning
This track focuses on high-level task planning across heterogeneous embodied agents. Built upon the ManiSkill platform and RoboCasa dataset, we curate a set of task scenarios involving diverse robot embodiments and complex collaborative goals. Given a structured scene image with multiple candidate agents (humanoids, quadrupeds, manipulators), participants need to complete the following two tasks:
- Select Agents: Choose a subset of appropriate agents from the scene based on a natural language command.
- Assign Actions: Define a sequence of high-level actions for each selected agent to accomplish the collaborative task.
This task evaluates the vision large language model's ability to reason over multi-agent allocation, role assignment, and symbolic planning, simulating real-world cooperation among diverse robots.
Code Submit SolutionTrack 2: Policy Execution for Multi-Agent Control
This track focuses on low-level policy execution in physically realistic simulation environments. It utilizes RoboFactory, a simulation benchmark for embodied agents based on the ManiSkill platform. Participants are required to deploy and control multiple embodied agents (e.g., robotic arms) to collaboratively complete manipulation-centric tasks like block stacking.
Each task is an episode where agents interact with dynamic objects in a shared workspace under partial observability and randomized conditions. The core challenge lies in achieving robust, learned coordination across multiple agents.
Code Submit SolutionContact
For any inquiries or further information, please contact us at marschallenge2025@gmail.com.
Organizers

Li Kang
Shanghai Jiao Tong University

Yiran Qin
The University of Oxford

Xiufeng Song
Shanghai Jiao Tong University

Ziye Wang
The University of Hong Kong

Stone Tao
UC San Diego

Heng Zhou
The University of Science and Technology of China

Rui Li
Central South University

Bruno Chen
Carnegie Mellon University

Ximeng Meng
Tongji University