Faster Learning, Smarter Robots: The Future of AI and Robotics

Taffy Das
7 min readApr 14, 2023
Photo by Jason Leung on Unsplash

Note: If you would like to watch a video version of this article with more visual examples, then click on the link below to watch it. Enjoy!

The robotic revolution is in full effect, and you probably don’t know it yet. Incredible progress in robotics is just around the corner, and it will happen at an unprecedented rate. The last decade or two has just been a warm-up act for what’s about to come! From manufacturing to shipping, robots have already infiltrated many industries. But that’s not all! Humanoid robots that can help you around the house may finally be realized in this decade. Like me, some of you already have robots in your homes. Yes, the Roomba. It’s just a basic cleaning bot, but we’re in the very early stages of learning how to live with these robots. Living with robots that help us with our chores will be the norm in a few years. Why do I think this ? Well, Google and Microsoft have each come up with innovative ways of teaching robots to learn much more than they ever did using some interesting AI solutions. One of these solutions includes using ChatGPT to teach the robot faster. If you’re ready to learn more, then I am; let’s dive in right away!

Before we discuss the research from both Microsoft and Google, let’s take a minute or two to understand the landscape of robotics over the years. Did you know that robots have been around since ancient times? That’s right; people have been designing self-operating devices for centuries. From purely mechanical robots to advanced versions that were powered by electronics. Now, we’re beginning to see AI and robotics integration more frequently, which is the next step in the robot revolution. Today, robots have the ability to process data, learn from their environment, and make decisions based on that information. One company that has been at the forefront of the industry is Boston Dynamics. Their robots are not just advanced; they’re also incredibly entertaining. Who can forget the viral video of Spot dancing to ‘Uptown Funk’ years ago ? Or even Atlas doing parkour like a boss. Over the years, their robots have become more agile, even achieving very challenging angled flips with ease. Incredible stuff. Surprisingly, despite their mind blowing abilities, these robots have less AI than you would imagine. The company has been focused on making more dexterous and durable robots since its inception in 1992. However, they recently announced a $400 million partnership with Hyundai for their newly formed AI Institute. This is a well-thought out plan from Boston Dynamics to heavily integrate AI into its already stellar robotics business. I’m very excited for this because we’ve only seen fun projects come out of the lab, who knows what else they may be cooking there after some advanced AI integrations?

Back to the main topic of this article: There has been a considerable push to make robot interactions much easier. Usually, getting a robot to do a task requires some technical prowess or predetermined steps the robot has to follow. What if you could simply tell the robot what you wanted in English or any other language, and it obeyed your command? Google was one of the first labs to tackle this with their PaLM-SayCan model, which I covered a few months back, but now feels like a lifetime ago. You can ask the robot for a drink or ask it to get rid of your trash, and it will do exactly that. I’ll put the link in the description below so you can check it out. So we’ve discussed a bit about how robotics has changed over the years, but let’s thoroughly explore one of the recent innovations and how it may transform the robotics industry for good. I’m talking about Microsoft’s new way of interacting with robots using ChatGPT. Yes, yes, I know the craze with ChatGPT; it’s still going strong, and you’re probably jaded about it now, but I promise, if you stick with me, you’ll understand why this is so important in the field of robotics. The goal of Microsoft’s research is to see if ChatGPT can go beyond just text and reason about the physical world to help robots with their tasks. ChatGPT is a language model that has been trained on a massive amount of text and human interactions, but can it really be used to control objects too? By using special prompts, high-level APIs, and human feedback, their design can guide language models to solve robotic actions. The traditional way of controlling robots involves writing code, which can be slow, expensive, and inefficient. But with ChatGPT, the scientists describe what they want, the code is generated, and high-level feedback is provided if required. This unlocks a new way of controlling robots using natural language. This approach can be applied to drones, robot arms, or even home assistant robots. Please see examples of videos here for clarity.

In one scenario, ChatGPT was used to accurately control a drone by parsing the prompt and using geometric cues. In another example, ChatGPT was used to teach a robot arm how to build the Microsoft logo out of wooden blocks. The logo was described to ChatGPT, and it generated high level code for composing the final form. Overall, this is the first step of something larger. The barrier for entry into the field will be reduced, allowing people to easily communicate with robots through high level commands while monitoring how well the robots perform during the testing phase. OpenAI released ChatGPT plugins where anyone can add helpful tools to turn ChatGPT from a regular Joe to Super Saiyan mode. We can expect some of these plugins in the near future to help you connect to embodied agents which can do tasks in the physical world. GPT-4 came with eyes, for images, soon it will come with legs, hands or wings depending on what you need it for.

The next breakthrough project comes from Google’ AI labs. They have made advances in robot learning that show promise in a variety of robot tasks and can generalize to new scenarios. This is a game-changer for the robotics industry! Let me explain why.

One of the fundamental reasons why robots learn skills slowly is the lack of diverse data. It’s like asking someone to learn to play in the NBA after only practicing in a few games. Now, that’s a recipe for disaster. The same is true for robots. To be truly useful, they need to learn a variety of motor skills and adapt to different environments. In the past, getting large-scale datasets required either a lot of human involvement or a lot of engineering for autonomous data collection. Both of these methods are hard to scale up. It can take many months to get enough data to train your robot. Who wants to wait that long? Enter Google’s solution, ROSIE, the latest data augmentation approach that’s set to revolutionize the robotics industry. ROSIE uses text-to-image models to obtain meaningful data for robot learning without requiring additional robot data. In simpler terms, we can teach robots new skills by using AI-generated images as input to create new scenarios for robots to learn from. It’s AI creating data for AI. But how does this work? ROSIE analyzes scenes from an existing video dataset, identifies areas of the scenes that need to be changed, and then uses inpainting to change only those areas while leaving the other elements intact. Inpainting is just a way of modifying specific parts of an image or video. This creates new environments from which the robot can learn. Using our NBA analogy, it’s like practising a few thousand more times before making it to the league. ROSIE also inpaints a metal sink into the scenes during training, so that during testing it is able to interact with an actual sink even though it never actually came across one. This is an imagined experience that helps the robot learn faster. Talk about bringing your imagination to life. ROSIE uses Google’s Imagen to generate the synthetic images for training. To put into context the importance of this work, it took 17 months and 13 robots to collect 130K demonstrations for training. With ROSIE the demonstrations can easily be scaled up to millions! ROSIE shows impressive results compared to robots trained on less data, outperforming them on all tasks especially in new scenarios. The possibilities with ROSIE are endless. We can take internet-scale data and distill it into robot experience. This means that we can teach robots to perform a wide range of tasks and adapt to different environments without the need for tedious work.

Another research project that came out recently is an interesting robotics application from Nvidia and some top research universities. So the idea is this; in order to teach robots to learn faster, the scientists created MimicPlay. This is a way robots can learn faster by mimicking humans. Essentially, the robot observes how a person completes a task and then tries to imitate the approach. The solution is obviously much more complex than this. First of all, there is a visual translation gap between a human hand and it’s capabilities versus a robot hand. In order to help the robot truly understand what it’s observing, the scientists operate a robot hand remotely. This way, we’re able to help the robot easily understand how to use its robot hand during test time. The advantage MimicPlay has is that we can quickly generate huge amounts of human examples, which helps the robot learn faster. There are other projects like FISH that also use imitation learning in clever ways to help the robots pick up on action paths much easier. I’ll share all the links below. These demonstrations may not look like much now, especially if you aren’t aware of the real complexities of robotics. But all these examples are innovative ways we can speed up robot learning.

The future of robotics is looking bright; ROSIE and language models like ChatGPT are leading the way in making robots much easier to work with. With more data and higher-level code abstractions, robots can learn new skills and adapt to different environments much quicker, making them even more useful. We’re witnessing the beginning of the robot revolution. Who knows what the future holds?

Thanks for reading!

Resources

ChatGPT for Robotics

Blog

https://www.microsoft.com/en-us/research/group/autonomous-systems-group-robotics/articles/chatgpt-for-robotics/

Paper

https://www.microsoft.com/en-us/research/uploads/prod/2023/02/ChatGPT___Robotics.pdf

ROSIE

Blog

https://diffusion-rosie.github.io/

Paper

https://arxiv.org/pdf/2302.11550.pdf

MimicPlay

Blog

https://mimic-play.github.io/

Paper

https://mimic-play.github.io/assets/MimicPlay.pdf

FISH

Blog

https://fast-imitation.github.io/

Paper

https://arxiv.org/pdf/2303.01497.pdf

--

--

Taffy Das
Taffy Das

Written by Taffy Das

Check out more exciting content on new AI updates and intersection with our daily lives. https://www.youtube.com/channel/UCsZRCvdmMPES2b-wyFsDMiA

No responses yet