We introduce ELA-ZSON, an Efficient Layout-Aware Zero-Shot Object Navigation (ZSON) approach designed for complex multi-room indoor environments.
By planning hierarchically leveraging a global topology map with layout information and detailed scene representation memory, ELA-ZSON achieves both efficient and effective navigation. The process is managed by an LLM-powered agent, ensuring seamless operation from exploration to navigation, without the need for human interaction, complex rewards, or costly training.
Our experimental results on the HM3D benchmark demonstrate a 16.7% point improvement in navigation success rate and 12% improvement in success rate weighted by path length compared to SOTA methods. Furthermore, we validate the robustness of our approach through real-world robotic deployment, showcasing its capability in practical scenarios.
Navigation examples
Obstacle avoidance examples
scene 17DRP5sb8fy
scene 2t7WUuJeko7
scene HxpKQynjfin
RPmzsHmrrY
The LLM agent takes user instructions as input and manages the optional action choices according to the prompts and data flow. Optional actions include exploring and recording the scene, constructing scene memory representation, planning, and executing navigation. Obstacles and error recognition together with iterative attempts are available.
The main components design is two-fold: hierarchical planning based on topo-layout and dense scene memory, LLM-powered agent that conducts the process from exploration to exploitation.