Most coverage of humanoid robotics has understandably focused on hardware design. Given the frequency with which their developers toss around the phrase “general purpose humanoids,” more attention ought to be paid to the first bit. After decades of single-purpose systems, the jump to more generalized systems will be a big one. We’re just not there yet.
The push to produce a robotic intelligence that can fully leverage the wide breadth of movements opened up by bipedal humanoid design has been a key topic for researchers. The use of generative AI in robotics has been a white-hot subject recently, as well. New research out of MIT points to how the latter might profoundly affect the former.
One of the biggest challenges on the road to general-purpose systems is training. We have a solid grasp on best practices for training humans how to do different jobs. The approaches to robotics, while promising, are fragmented. There are a lot of promising methods, including reinforcement and imitation learning, but future solutions will likely involve combinations of these methods, augmented by generative AI models.
One of the prime use cases suggested by the MIT team is the ability to collate relevant information from these small, task-specific datasets. The method has been dubbed policy composition (PoCo). Tasks include useful robot actions like pounding in a nail and flipping things with a spatula.
“[Researchers] train a separate diffusion model to learn a strategy, or policy, for completing one task using one specific dataset,” the school notes. “Then they combine the policies learned by the diffusion models into a general policy that enables a robot to perform multiple tasks in various settings.”
Per MIT, the incorporation of diffusion models improved task performance by 20%. That includes the ability to execute tasks that require multiple tools, as well as learning/adapting to unfamiliar tasks. The system is able to combine pertinent information from different datasets into a chain of actions required to execute a task.
“One of the benefits of this approach is that we can combine policies to get the best of both worlds,” says the paper’s lead author, Lirui Wang. “For instance, a policy trained on real-world data might be able to achieve more dexterity, while a policy trained on simulation might be able to achieve more generalization.”
The goal of this specific work is the creation of intelligence systems that allow robots to swap different tools to perform different tasks. The proliferation of multi-purpose systems would take the industry a step closer to general-purpose dream.