Introduction
With the emergence of embodied multi-agent systems in robotics, enabling collaboration among heterogeneous agents—such as humanoids, quadrupeds, and manipulators—has become a fundamental step toward achieving general-purpose, real-world autonomy. These agents must not only coordinate high-level plans but also execute them robustly in complex physical environments.
However, bridging the gap between symbolic planning and continuous control across multiple bodies poses significant challenges, especially when agents possess diverse capabilities and partial observations.
This challenge aims to promote research on this topic through two distinct yet complementary tracks. By decoupling planning and control, we aim to explore the planning capabilities of vision-language models (VLMs), while also investigating the potential of end-to-end models for coordinated control across multiple robotic arms.
If you are interested in this challenge, please fill out the form below. We will release registration information and provide updates about the competition soon.
Join our Mailing List!Timeline
July 15, 2025
Pre-registration Opens
August 1, 2025
Warm-up Round Begins
September 1, 2025
Official Round Starts
October 31, 2025
Official Round Ends
December 2025
Result Announcement & Award Ceremony
Track 1: Multi-Agent Embodied Planning
This track focuses on high-level task planning across heterogeneous embodied agents. Built upon the ManiSkill platform and RoboCasa dataset, we curate a set of task scenarios involving diverse robot embodiments and complex collaborative goals. Given a structured scene image with multiple candidate agents (humanoids, quadrupeds, manipulators), participants need to complete the following two tasks:
- Select Agents: Choose a subset of appropriate agents from the scene based on a natural language command.
- Assign Actions: Define a sequence of high-level actions for each selected agent to accomplish the collaborative task.
This task evaluates the vision large language model's ability to reason over multi-agent allocation, role assignment, and symbolic planning, simulating real-world cooperation among diverse robots.
Track 2: Policy Execution for Multi-Agent Control
This track focuses on low-level policy execution in physically realistic simulation environments. It utilizes RoboFactory, a simulation benchmark for embodied agents based on the ManiSkill platform. Participants are required to deploy and control multiple embodied agents (e.g., robotic arms) to collaboratively complete manipulation-centric tasks like block stacking.
Each task is an episode where agents interact with dynamic objects in a shared workspace under partial observability and randomized conditions. The core challenge lies in achieving robust, learned coordination across multiple agents.
Contact
For any inquiries or further information, please contact us at icmlmarschallenge@gmail.com.
Organizers

Li Kang
Shanghai Jiao Tong University

Yiran Qin
The University of Oxford

Ziye Wang
The University of Hong Kong

Heng Zhou
The University of Science and Technology of China

Rui Li
Shanghai Artificial Intelligence Laboratory

Xiufeng Song
Shanghai Jiao Tong University

Bruno Chen
Carnegie Mellon University

Zhenfei Yin
The University of Oxford

Xiaohong Liu
Shanghai Jiao Tong University

Ruimao Zhang
Sun Yat-sen University

Lei Bai
Shanghai Artificial Intelligence Laboratory

Philip Torr
The University of Oxford