Semantic 3D perception and VLM-based planning for humanoid loco-manipulation

Short demonstration (YouTube mirror: link)