TechCrunch and Peter Chen project AI robotics’ ‘GPT moment’ is (very) near. The following excerpt from TechCrunch's article, authored by Covariant’s Peter Chen, provides technical insights into how the same three core pillars of AI which enable the success of Large Language Models (LLMs) like OpenAI’s GPT, are also driving the surge of scalable AI-powered robots. Peter shares how robots, capable of learning how to interact with and adapt to the physical world, will unlock as many efficiencies as we’ve seen in the digital world over the past few decades.
Peter has published over 30 academic papers that have appeared in the top global AI and machine learning journals. Learn more about our team’s research here.
The same core technology that allows GPT to see, think, and even speak also enables machines to see, think, and act. Robots powered by a foundation model can understand their physical surroundings, make informed decisions, and adapt their actions to changing circumstances.
1. Foundation model approach
By taking a foundation model approach, you can also build one AI that works across multiple tasks in the physical world. A few years ago, experts advised making a specialized AI for robots that pick and pack grocery items. And that’s different from a model that can sort various electrical parts, which is different from the model unloading pallets from a truck.
This paradigm shift to a foundation model enables the AI to better respond to edge-case scenarios that frequently exist in unstructured real-world environments and might otherwise stump models with narrower training. Building one generalized AI for all of these scenarios is more successful. It’s by training on everything that you get the human-level autonomy we’ve been missing from the previous generations of robots.
2. Training on a large, proprietary, and high-quality dataset
Teaching a robot to learn what actions lead to success and what leads to failure is extremely difficult. It requires extensive high-quality data based on real-world physical interactions. Single lab settings or video examples are unreliable or robust enough sources (e.g., YouTube videos fail to translate the details of the physical interaction and academic datasets tend to be limited in scope).
Unlike AI for language or image processing, no preexisting dataset represents how robots should interact with the physical world. Thus, the large, high-quality dataset becomes a more complex challenge to solve in robotics, and deploying a fleet of robots in production is the only way to build a diverse dataset.
3. Role of reinforcement learning
Similar to answering text questions with human-level capability, robotic control and manipulation require an agent to seek progress toward a goal that has no single, unique, correct answer (e.g., “What’s a successful way to pick up this red onion?”). Once again, more than pure supervised learning is required.
You need a robot running deep reinforcement learning (deep RL) to succeed in robotics. This autonomous, self-learning approach combines RL with deep neural networks to unlock higher levels of performance — the AI will automatically adapt its learning strategies and continue to fine-tune its skills as it experiences new scenarios.
4. Challenging, explosive growth is coming
In the past few years, some of the world’s brightest AI and robotics experts laid the technical and commercial groundwork for a robotic foundation model revolution that will redefine the future of artificial intelligence.
While these AI models have been built similarly to GPT, achieving human-level autonomy in the physical world is a different scientific challenge for two reasons:
- Building an AI-based product that can serve a variety of real-world settings has a remarkable set of complex physical requirements. The AI must adapt to different hardware applications, as it’s doubtful that one hardware will work across various industries (logistics, transportation, manufacturing, retail, agriculture, healthcare, etc.) and activities within each sector.
- Warehouses and distribution centers are an ideal learning environment for AI models in the physical world. It’s common to have hundreds of thousands or even millions of different stock-keeping units (SKUs) flowing through any facility at any given moment — delivering the large, proprietary, and high-quality dataset needed to train the “GPT for robotics.”