Humanoid industrial robots are always a little confusing for me. The human form is not best suited for industrial tasks, and by making specialised robot arms, you could improve efficiency etc. It's only if you need to interact with systems that were designed for humans, and can't be modified to work with a more efficient robot that you need humanoids
Every single task that was easy and economical to offload to a single purpose robot arm bolted down to the floor was already offloaded to a single purpose robot arm bolted down to the floor.
What remains is: all those quirky little one-off processes that aren't very amenable to "robot arm" automation, aren't worth the process design effort to make them amenable to it, and are currently solved by human labor.
Thus, you design new solutions to target that open niche.
Humans aren't perfect at anything, but they are passable at everything. Universal worker robots attempt to replicate that.
"A drop-in replacement for simple human labor" is a very lucrative thing, assuming one could pull it off. And that favors humanoid hulls.
Not that it's the form that's the bottleneck for that, not really. The problem of universal robots is fundamentally an AI problem. Today, we could build a humanoid body that could mechanically perform over 90% of all industrial tasks performed by humans, but not the AI that would actually make it do it.
My impression is that a big part of the reason for the sudden boom in humanoid robots is that they lend themselves particularly well to RL based training using human-made training footage using VR. It’s much easier to have a robot broadly copy human actions if the robot looks like a human, instead of having to first translate the human action to your robot arm equivalent.
The success of large multipurpose AI models trained on web-scale data pushed a lot of people towards "cracking general purpose robot AI might be possible within a decade".
Whether transfer learning from human VR/teleop data is the best way to do it remains uncertain - there are many approaches towards training and data collection. Although transfer learning from web-scale data, teleoperation and "RL IRL" are common - usually on different ends of the training pipeline.
Tesla got the memo earlier than most, because Musk is a mad bleeding edge technology demon, but many others followed shortly before or during the public 2022 AI boom.
That is certainly a factor, but you also have to take into account that all these tasks in the factories are now centered around the human form because humans are doing them.
Yes that's pretty much it. Some people from boston dynamics were talking on a podcast. And they were saying that they sat down with toyota and figured out they could automate all the tasks in a factory, but it would take 10000 man years or something and toyota makes new trims every six months so you need about 10000 man years every six months or so.
It's the flexibility and adaptability with minimum training that's required.
Looking at the video at the bottom of the page, the robot looks like an old man, especially in the trash bag throwing sequence. Compare that to the recent Chinese kung-fu robots video...
Completely different situations. The Unitree demos are prerecorded movements with no real adaptability. While visually impressive, they are highly tuned to perform that specific sequence of actions. If you walked in front of one it would have zero awareness of you and you’d be hit. They’re essentially “blind”.
The last video here is likely demonstrating a teleoperated humanoid.
> The Unitree demos are prerecorded movements with no real adaptability.
That is not true. The routine is preprogrammed, but there is adaptability. If there wasn't they would fall on the ground in the first 5 seconds. The movement involved in the routine we saw requires continuous adjustment. You can't just record the movement as you would with a video game animation, real physics get in the way and you end up on your back on the ground trying to do a jump and a backflip.
The robots motion is not preprogramed at all, see how much more smooth the motion is?
Thats because boston dynamics are using an approach where they try to calculate and take the dynamics of motion into account, just like Unitree.
The kawasaki approach is clearly to use overwhelming torques in an effort to cancel all the dymanics and produce fully controlled movement. Exactly what an old man does as well or a robotic arm in a factory. It's honestly embarrassing it looks like kawasaki has no progress in the last 30 years their robots still move like its 1996.
I'm honestly more concerned with your lack of understanding of these topics.
There are two main ways to accomplish what the kung-fu robot does.
First you train a reinforcement learning policy for balancing and walking and a bunch of dynamic movements, then you record the movement you want to perform using motion capture, then you play back the recorded trajectory.
Second, you train a reinforcement learning policy for balancing and walking, but also bake in the recorded movement into the policy directly.
Okay, I lied. There is also a third way. You can use model predictive control and build a balancing objective by hand and then replay the recorded trajectory, but I think this method won't be as successful for the shown choreography however it's what Boston dynamics did for a long time.
In both cases you will still be limited to a pre-recorded task description. Is this really that hard to understand? Do you really think someone taught the robot in Chinese language and by performing the movement in front of the camera of the humanoid how to perform the choreography like a real human or that the robot came up with the choreography on its own? Because that's the conclusion you have to draw if you deny the two methods I described above.
The kawasaki robot had to do something much more impressive, which is to lift a table while another human is holding the other end.
The actual concern here is that there are too many cuts. If the whole table movement sequence was uncut and fully autonomous, that would mean they have the most advanced humanoid robot software in the world.
It means they can autonomously find the correct grasping location on the table for both arms, meaning the robot needs to have a model of the table. The robot needs to know at what height to hold the table to keep the table level and compensate for the human pulling on the object while balancing and autonomously following the direction the human is pulling in.
Of course, since there were many cuts, we don't really know whether that's true. We also don't know if teleoperation is involved or not.
The Chinese robot dancing is cool, because it shows what the hardware is capable of, but it doesn't really show anything on the software side. Contacts with objects are hard in robotics and the kung-fu choreography avoids them for obvious reasons.
Humanoid industrial robots are always a little confusing for me. The human form is not best suited for industrial tasks, and by making specialised robot arms, you could improve efficiency etc. It's only if you need to interact with systems that were designed for humans, and can't be modified to work with a more efficient robot that you need humanoids
Every single task that was easy and economical to offload to a single purpose robot arm bolted down to the floor was already offloaded to a single purpose robot arm bolted down to the floor.
What remains is: all those quirky little one-off processes that aren't very amenable to "robot arm" automation, aren't worth the process design effort to make them amenable to it, and are currently solved by human labor.
Thus, you design new solutions to target that open niche.
Humans aren't perfect at anything, but they are passable at everything. Universal worker robots attempt to replicate that.
"A drop-in replacement for simple human labor" is a very lucrative thing, assuming one could pull it off. And that favors humanoid hulls.
Not that it's the form that's the bottleneck for that, not really. The problem of universal robots is fundamentally an AI problem. Today, we could build a humanoid body that could mechanically perform over 90% of all industrial tasks performed by humans, but not the AI that would actually make it do it.
My impression is that a big part of the reason for the sudden boom in humanoid robots is that they lend themselves particularly well to RL based training using human-made training footage using VR. It’s much easier to have a robot broadly copy human actions if the robot looks like a human, instead of having to first translate the human action to your robot arm equivalent.
The big part is the rise of modern AI in general.
The success of large multipurpose AI models trained on web-scale data pushed a lot of people towards "cracking general purpose robot AI might be possible within a decade".
Whether transfer learning from human VR/teleop data is the best way to do it remains uncertain - there are many approaches towards training and data collection. Although transfer learning from web-scale data, teleoperation and "RL IRL" are common - usually on different ends of the training pipeline.
Tesla got the memo earlier than most, because Musk is a mad bleeding edge technology demon, but many others followed shortly before or during the public 2022 AI boom.
That is certainly a factor, but you also have to take into account that all these tasks in the factories are now centered around the human form because humans are doing them.
Yes that's pretty much it. Some people from boston dynamics were talking on a podcast. And they were saying that they sat down with toyota and figured out they could automate all the tasks in a factory, but it would take 10000 man years or something and toyota makes new trims every six months so you need about 10000 man years every six months or so.
It's the flexibility and adaptability with minimum training that's required.
Looking at the video at the bottom of the page, the robot looks like an old man, especially in the trash bag throwing sequence. Compare that to the recent Chinese kung-fu robots video...
Completely different situations. The Unitree demos are prerecorded movements with no real adaptability. While visually impressive, they are highly tuned to perform that specific sequence of actions. If you walked in front of one it would have zero awareness of you and you’d be hit. They’re essentially “blind”. The last video here is likely demonstrating a teleoperated humanoid.
> The Unitree demos are prerecorded movements with no real adaptability.
That is not true. The routine is preprogrammed, but there is adaptability. If there wasn't they would fall on the ground in the first 5 seconds. The movement involved in the routine we saw requires continuous adjustment. You can't just record the movement as you would with a video game animation, real physics get in the way and you end up on your back on the ground trying to do a jump and a backflip.
If you think I am wrong, sure I could be but have a look at atlas, https://www.youtube.com/watch?v=oe1dke3Cf7I
The robots motion is not preprogramed at all, see how much more smooth the motion is?
Thats because boston dynamics are using an approach where they try to calculate and take the dynamics of motion into account, just like Unitree.
The kawasaki approach is clearly to use overwhelming torques in an effort to cancel all the dymanics and produce fully controlled movement. Exactly what an old man does as well or a robotic arm in a factory. It's honestly embarrassing it looks like kawasaki has no progress in the last 30 years their robots still move like its 1996.
Have a look here https://underactuated.csail.mit.edu/intro.html for a more indepth explanation of the difference between the two approaches.
I'm honestly more concerned with your lack of understanding of these topics.
There are two main ways to accomplish what the kung-fu robot does.
First you train a reinforcement learning policy for balancing and walking and a bunch of dynamic movements, then you record the movement you want to perform using motion capture, then you play back the recorded trajectory.
Second, you train a reinforcement learning policy for balancing and walking, but also bake in the recorded movement into the policy directly.
Okay, I lied. There is also a third way. You can use model predictive control and build a balancing objective by hand and then replay the recorded trajectory, but I think this method won't be as successful for the shown choreography however it's what Boston dynamics did for a long time.
In both cases you will still be limited to a pre-recorded task description. Is this really that hard to understand? Do you really think someone taught the robot in Chinese language and by performing the movement in front of the camera of the humanoid how to perform the choreography like a real human or that the robot came up with the choreography on its own? Because that's the conclusion you have to draw if you deny the two methods I described above.
The kawasaki robot had to do something much more impressive, which is to lift a table while another human is holding the other end.
The actual concern here is that there are too many cuts. If the whole table movement sequence was uncut and fully autonomous, that would mean they have the most advanced humanoid robot software in the world.
It means they can autonomously find the correct grasping location on the table for both arms, meaning the robot needs to have a model of the table. The robot needs to know at what height to hold the table to keep the table level and compensate for the human pulling on the object while balancing and autonomously following the direction the human is pulling in.
Of course, since there were many cuts, we don't really know whether that's true. We also don't know if teleoperation is involved or not.
The Chinese robot dancing is cool, because it shows what the hardware is capable of, but it doesn't really show anything on the software side. Contacts with objects are hard in robotics and the kung-fu choreography avoids them for obvious reasons.