Why Robots Still Struggle With Simple Tasks (And What Might Finally Change That) | Karol Hausman, Co-Founder & CEO of Physical Intelligence
Karol Hausman is the co-founder and CEO of Physical Intelligence, a robotics company building a general-purpose “AI brain for the physical world.” The company has raised more than $1 billion in funding to develop foundation models that allow robots to operate across many machines, environments, and tasks rather than being programmed for a single purpose. The core thesis: the same scaling dynamics that transformed language models may also unlock robotic intelligence. But only if you resist every commercial pressure pushing you toward specialization. The central challenge isn’t mechanical design. It’s intelligence: how robots learn, generalize, and interact with a physical world that is far harder to simulate than it is to describe.
Appears in
- Uploaded
- Uploaded May 26, 2026
- File type
- POD
- Queried
- 0
Full transcript
Showing the full transcript for this episode.
Speaker A: The more robots you deploy, the better the models should get, because I think the models will keep on getting better. They'll be able to absorb more and more data. As I started looking into robotics and started studying it, it was very disappointing. It seemed like there are no intelligent robots out there and no one was working on them. Every robot you would see would be this pre-programmed machine that just goes from A to B and does it repeatedly and has no intelligence whatsoever. That was maybe like a wake-up call.
So many people excited about this future that we saw in the movies that we read about in the books, and it seems like no one is working on it. We had this one particular experiment with a Coke can in front of a robot and 3 pictures of different celebrities in front of it. And the prompt we gave to the model we built was put the Coke can on picture of Taylor Swift. The robot picked it up and then slowly moved it towards Taylor Swift, all from internet data. That was the moment where it clicked for us that where you can bring in a lot of prior knowledge from LLMs, from the internet, and connect it to robot motions.
And it felt like it opened another door. Maybe, just maybe, if we do everything right, if we combine it with internet knowledge, if we scale it up, if we do all of the pieces that need to be done, it might work. And at At that point, it became clear that the way to accomplish this is to create an organization whose sole purpose is to solve physical intelligence. Speaker B: We taught machines to beat the world's best chess players long before we taught them to reliably fold a towel or carry a coffee cup.
Speaker C: That paradox is at the heart —of the robotics industry, responsible for many of its false dawns over the years. Speaker C: That paradox is at the heart —of the robotics industry, responsible for many of its false dawns over the years. Speaker B: Karel Haussmann believes that he's found the way forward. Speaker C: The CEO of Physical Intelligence has raised more than $1 billion in venture funding to build an AI brain for the real world, one that helps robots generalize accurately across different tasks and environments. Speaker B: He's doing so by taking a very different approach from many of his best-funded competitors.
Speaker C: Focusing on gathering data from actual interactions rather than through simulations. Speaker B: In today's episode, Karel and I discuss lessons learned from the inner game of tennis, why reinforcement learning is making a comeback, and the challenges of creating true physical intelligence. I'm Mario, and this is The Generalist. Speaker C: This episode is brought to you by Brex. If you're a founder, the hardest part isn't the idea. It's scaling fast without getting buried in back office work. Speaker B: That's where Brex comes in. Speaker C: Brex is the intelligent finance platform for founders.
With Brex, you get high-limit corporate cards, easy banking, and high-yield treasury, plus a team of AI agents that handle manual finance tasks for you. They take care of things like expenses, all according to your rules, so you can move faster while staying in full control. 1 in 3 startups in the US already runs on Brex. You can too at com/mario. I'm really excited about today's sponsor, Granola. Simply put, Granola is the AI notepad for people in back-to-back meetings. I've been using Granola for over a year now, and honestly, it's a tool that has transformed the way I work.
Granola takes meeting notes for you without any intrusive bots joining your calls. You can jot down rough notes like you always do, and in the background, Granola transcribes and turns those notes into clear, useful notes when the meeting ends. You can also chat with your notes, which is one of my favorite features. If someone says something on the call that you didn't quite catch or want to learn more about, Granola can help you out. It's an amazing way to be better informed during a conversation without having to interrupt everyone else's flow.
You can also have Granola review all your recent conversations to pull out to-dos, write a weekly recap, or surface interesting ideas you might have forgotten. Another thing I love. To get started with Granola, head to ai/mario. And for new users, you can get 3 months free with the code Mario. So go to ai/mario and use code Mario for 3 months free. Speaker B: I'm so excited to have this conversation today. Uh, in, in studying what you're building with Physical Intelligence. Naturally, much of the conversation is around the technology, but, uh, the more I, I looked into your story and the company, it feels like what you're building is really personal and connects to some of your, your background in philosophy and all these other things.
Uh, so I'd love to start, you know, maybe with some of, some of that first. When were your earliest memories of being excited by robots? Speaker A: Probably with watching movies as a kid. I think I was a big Star Wars fan. I was reading all the books, watching all the movies multiple times, and I really loved the way that robots were portrayed in that world, the diversity of them, different functionalities, the way they interacted with people, the way that there are good robots and bad robots. But I don't know if I can pinpoint a specific point.
Um, I remember just, just growing up being fascinated by that concept that you can build something that is so human-like. And I always thought that if you were able to do that, it would allow us to understand ourselves a little bit better. And I was fascinated by kind of the things that I think people are just fascinated as they're, as they're growing up, um, you know, asking deep questions about The nature of reality, how the brain works, what does it mean to think? And I thought that robots probably are going to have some answers to these questions because reading other kinds of books or studying philosophy didn't have good answers to these questions.
Then more importantly, as I started looking into robotics and started studying it, the first impression I had was that it was very disappointing in that every robot you would see would, or basically it seemed like there are no intelligent robots out there and no one was working with them. Every robot you would see would be this pre-programmed machine that just goes from A to B and does it repeatedly and has no intelligence whatsoever. So I think that was maybe the most defining moment for me when I saw that, because that was like a wake-up call of, you know, there's so many people excited about this future that we saw in the movies that we read about in the books.
and it seems like no one is working on it. Speaker B: You said that, you know, in some ways you had, you know, that you're asking the classic questions of every child about the nature of the universe and the human mind. Uh, I don't know if that's as universal as, as you might imagine it to be. I think probably there are some kids who really become fascinated by that, but maybe not all. Were your parents scientists? Were there things that they did that sort of encouraged that natural curiosity that you must have had?
Speaker A: Not really, no. My dad was a mechanic and my mom was an entrepreneur. So not really, there wasn't a lot of science at home. I think it was mostly coming from watching movies and being fascinated by it. I also went to, so I come from a small town in Poland. I went to high school that had very high talent density for that for that area. And I remember that being then an environment where a lot of these questions were being asked and it was very encouraged. So I found a group of friends that were really fascinated by similar things.
And I think that's what really increased my interest in that area. It was just a thing that you would talk about in high school. Speaker B: You had a tweet that I saw where you were talking about Fei-Fei Li's biography and how it sort of spoke to you in some sense that, you know, the immigration story and also the fact of, you know, having teachers that believed in you along the way were sort of parallels to your own life. In high school, was that when you sort of had that first batch of those teachers who maybe saw some of that, you know, really remarkable early talent?
Speaker A: I think it was throughout. It wasn't just high school. It was throughout. There were multiple people that now looking back, I'm really grateful to. That either saw something or just encouraged me to dig deeper or motivated me or gave me some opportunities that I wouldn't have otherwise. And I think at every single stage there was someone I could point to and reading Feynman's book just reminded me of that. And I remember after I read it, I pinged some of them and thanked them for everything they did for me.
So yeah, it was, it was a long journey from from Poland through Germany to the US, going through multiple places in the US. And it felt like at every single stage of the journey, there was someone kind of rooting for me and helping me. Speaker B: You know, before you realized that you wanted to work in robotics in particular, as an undergrad, you studied sort of computer science and philosophy, but I'm curious if there were interests before that, like as a kid, were you thinking, I'm going to be a astrophysicist or, I don't know, a neuroscientist?
Speaker A: Yeah, I was, I was really into physics actually. Right after high school, I applied to a lot of physics and math programs and I thought I was going to be an astrophysicist or a theoretical physicist. I was really fascinated by it. I was reading a lot of books. I was really into physics in high school. But then a lot of my friends decided to go for engineering degrees. And we were all gonna go to Warsaw. And I thought that I could maybe try to study both, or I could start with engineering and then go back to physics if it turns out to be boring.
It wasn't like a very intentional choice at that time. It was more like, these are really smart people. I really wanna stay with my friends and then, and robots are really, really cool. So let me try that and then see how it goes. So I went to Warsaw to study it was called mechatronics at the time, which was basically like a mixture of electronics, some mechanical engineering, computer science. Then I also studied philosophy, but I think that's when, once I moved to Warsaw, that's when I started realizing that robotics is a thing, but it's a thing that is very disappointing.
I remember studying robotics in Warsaw, the kind of the dream job you could get was being a roboticist at an ABB factory. That was right next to Warsaw. That was like the job that everybody was competing for. If you could work for Aven on the production line, that's like, that's how you made it in the robotics industry. And I remember I went there once and saw that production line and it was the most uninspiring thing I've ever seen with robots. As I said before, robots just moving from A to B, not being intelligent whatsoever.
And I think that's when when it started clicking for me that I would really like to do something beyond that. And it seemed like no one was working on it, which sparked my interest even more. So that's where I started digging who is doing intelligent robots and is it even a thing? And that's then what defined, I think, the rest of the journey of trying to find those people, move from place to place as I try to, as I try to find them and then eventually start working on it. Speaker B: Myself.
The sort of study of philosophy, as you've looked back on that time, you know, maybe it's a, a line on the CV at this point and you're like, you know, I, I don't reflect on it necessarily that much, but I'm curious if there are writers that you've ended up coming back to more and more as you start to do this sort of work around, I don't know, the nature of consciousness or the, you know, the certain senses that make us more or less human. Speaker A: It was a really interesting period.
So I think there was a few realizations I had while studying it. I think one was that a lot of philosophy is the history of philosophy. It's not philosophy itself. It's talking about what others thought about years ago. From the outside looking in, it always sounds like Plato or people like that, they were these geniuses that kind of like figured everything out way before their time. But then when you actually read their works, and that was, you know, part of the deal of studying philosophy, you realize that there's a very small percentage of things that kind of sound right.
And there are a lot of things that they said that were absolutely wrong. Yes. And that was quite eye-opening that, you know, the history of philosophy shows that there are people who made many, many mistakes along the way and thought that the world worked a certain way when it's now very clear that it doesn't. They had some really interesting observations, and I think we we often over-index on those, but we were wrong a lot. There are still some philosophers that I remember studying that I think were very relevant and I was a big fan of.
One of them is Spinoza. That was, I think, my favorite philosopher when I, when I look back at what I studied. But then there were some subjects, very, very few that were more about philosophizing today rather than looking at the history. And I think those were the ones that were the most fun for me. There was this one subject called ontology, which was a study of things of how they are. I think it's even difficult to comprehend what this means, but it was one of those subjects where you just like sit and contemplate reality.
It just was very fun, even though it was very unstructured and it was it wasn't clear what is, what is right and what isn't. Um, it just seemed like they touched something real. And probably that's the subject I remembered the most. That's where I had the most fun. Um, much more so the history of philosophy. So interesting. Speaker B: And, and these things I love to explore because there is, to me at least, such deep parallels between some of the things that you're trying to build to fruition. Why Spinoza? What was it about him that that seemed deep to you or true?
Speaker A: The one thing that, that really resonated with me was this idea that, you know, at that time a lot of philosophers were contemplating the nature of God and the nature of reality. And there was often this idea that God is this separate entity that either was the creator of everything or had some kind of way of controlling everything that is that is happening in the universe. And I think with, with him, what was, what was so, um, interesting was that he said that everything around us, like all of reality, that is God.
That is, that is what it is. And there is some underlying structure to it. And that structure in and of itself is where the beauty lies and where the, where the intelligence lies. And I, and I thought that was very thought-provoking. And as you study physics and, you know, things like robotics or AI, I think more and more you realize that there's some underlying structure to things. Uh, and in some ways that structure is quite counterintuitive, like why, why it's there. Why is it so that you can describe all the complexity of the world in a simple equation rather than in a million different equations?
And the more beautiful and short that equation is, the more real it seems. Or the more accurate it is. And I thought this was kind of the closest to what it feels like today, given the current state of science. Speaker B: You had a tweet where you were talking about a book that I tried to read as much of it as I could in preparation for this called The Inner Game of Tennis. And you sort of mentioned how, I mean, you were really talking about tennis, but it seemed like some deep parallels to robotics.
You sort of make the point that they talk about the fact in this book that if you really focus with your conscious mind on improving your stroke, you sort of get to this local maximum where maybe you are doing a little bit better at hitting a forehand, but really what you need to do to get to an optimal stroke is sort of allow the unconscious mind, uh, to, to take over in some respect. How do you think about emulating this aspect of almost the unconscious and the role it plays in fluid motion and I don't know, proprioception and all these sort of things when it comes to robotics.
Is that— were you thinking about robotics when you were reading that book? Speaker A: A lot. Yeah, definitely I was. I don't think it just applies to motion. I think it's— there's a big difference between how we learn things and how we think we learn things or how we teach things to others. You know, even if you look at something like learning a language, The way we thought you're supposed to teach it or the way we would want to teach it to others in school is by defining all the rules and, you know, you first learn about grammar and all the different concepts and how tenses work and what word follows what other word.
So you learn all of that structure so that the thinking is that if you know all of the structure, you can follow those rules and then speak like a native speaker would. But that's not how we learn language. Right? Like you're just immersed in it. You just hear everybody using the structure that they kind of intuitively know. And then all of a sudden you emerge with a full understanding of how language works, even though you can't really pinpoint any of these rules. There are many native speakers that speak perfectly, follow all the rules exactly, but have no idea what these rules are, underlying structure is, what declination is, what tenses are.
They have no idea of any of that. But they still follow them perfectly. With that book, I think that was shown, but in the motion aspect of things where, you know, you could read all you want about how to play tennis or you can try to describe, you know, where your elbow should be exactly where you hit the forehand or how to adjust your swing. But the only way to really learn it is by just doing it and kind of immersing yourself in it. And I think there was a lot of parallels to the robotics world and, and to the world of AI, how we, how we thought that the machines should think or how they should learn things versus how they actually learn them.
Speaker B: You, I think in 2023, say that you had a moment where for the first time you really felt like you could see a bit of the future of the robotics industry, uh, and things were really clicking into place, but you had really been in that field for quite a while already. It strikes me. So I wondered what the period before that moment was like. Was that quite a lonely time to be working in a space where you don't know how it's going to play out? Speaker A: So maybe I can go back and tell you a little bit more about the journey.
I would love that. So you understand that kind of the feeling. After I moved to Warsaw, as I mentioned, it was very disappointing to see the state of robotics. And for a long time, it felt like no one was working on the intelligent robots that I really wanted to work on. And I remember at the time I was searching for really anything, anybody who is doing something that is a little bit closer to what I imagined robots to be. And for a long time it felt like no one was. Then I remember I finally found someone.
I found a professor in Germany that was writing papers that seemed a little bit closer to what I would imagine robots to be. And there was this master's program. I mean, in Munich called Robotics, Cognition and Intelligence. And I thought if I apply for this program and get in and they don't do intelligent robots, then I'm basically lost. Like if the program with that title doesn't do anything that I imagined, then something is very, very wrong. So I applied and moved to Munich and I remember the first day of classes I went to one of the very first classes and I got stuck in a metro somewhere and was super late, so missed the entire class.
And I showed up, no one was there. The class was over at that point, but I saw a janitor and I was looking for a bathroom, so I asked him where the bathroom is. He pointed me to the third floor of the building. I remember it very vividly. And as I went upstairs, I remember seeing this big, big doors and there was some mechanical sound coming from behind them. I don't know why, but I just decided to like open them and see what's going on there, that the sound was very intriguing.
And as I open these doors, I see these two basically humanoid robots doing— one of them was making popcorn and the other one, I believe, was spreading butter on a toast. And that was the moment where, you know, this is something I've been searching for at this point for years. So that was the moment that, that was probably one of the most powerful moments I had during that journey where I finally found it. There's robots that look intelligent, that do something that is very human-like. There are people who work on this.
So I immediately left the room, knocked, because there's basically no one there. And I knocked at the room next door and asked if I could work there. And basically I was completely unqualified. I had no idea how to program these robots. I was not familiar with any of the tools, but the person who was there, it was another one of those people that gave me a chance and, and asked me to just come in on Monday and see what I could do. Wow. Um, I got involved, got a job at that, at that lab.
That's how I got into finally working on intelligent robots. So yeah, I think this is like yet another person that I'm really grateful to. And then afterwards I, I moved to the US, got into a PhD program studying intelligent robots. And at that point, I was so stoked that I finally get to do this and I finally found a small group of people that work on those things. It didn't matter that it doesn't work that well. I was just so excited that I finally get to meet the people that are working on this.
I can learn from them. I can push it forward and I can finally work on the thing that I always wanted to work on. So I did this throughout my PhD and I had another moment like this during my PhD, this kind of defining moment where I was working on a certain set of problems where, uh, it was referred to as active perception. And I can, I can go a little bit deeper into this if you're interested. But, um, at that point it kind of felt like there are ways to write, to, to, to write papers, to, to progress in your PhD, but there isn't anything that felt like it could actually solve this problem.
It all felt like the world is too complex. You can push it to some extent, this technology, but you won't really get to the finish line. They're not, they're not kind of adding up. You don't really see a path out of it. And there was another moment where a postdoc candidate stopped by our lab and at that time when a postdoc candidate stops by, you're supposed to, as a PhD student, show them your work and kind of talk with them as part of the interview. So I showed to that postdoc candidate stuff that I was working on and he listened to all of it.
And then at the end of it, he said that I should drop all of this and switch to deep learning. That's the way to solve the problem that I was actually working on. At that point, I just thought to myself that he had no idea what he's talking about and he probably didn't listen to me. Because at that point I think I spent already 2 years or something like this working on the topic I was working on. But then later that day I went to his lecture where he showed what he was working on.
And that was another one of those moments where it was extremely eye-opening because for the first time I saw something that like could actually work. Where it was less of like, you know, here's this little paper over there and a little paper over there, but they don't add up. But it's something that kind of brought all the pieces together. And yeah, that was another one of those moments, like the similar one to the one I've experienced in Munich, where at that point I decided to drop my PhD topic, change it completely, start collaborating with that person, and do everything I can to push deep learning in robotics.
And that person was Sergey Levin, who is now my co-founder at Physical Intelligence and one of the pioneers of deep learning in robotics. And yeah, I never looked back after that. But to kind of, this is a long way of answering your question, but it didn't feel lonely in that, you know, I was just really excited to be working on that problem. Speaker B: That's such good stories there. The first one is almost like, you know, when a wizard is accepted to Hogwarts or something, it's like you finally found your, you know, collection of magical people and the magic that they were doing.
The Sergey story is also so interesting to me because you seem to have gone from skeptical to convinced very, very fast. Like what was the cycle on that? Was that like literally the same day, the same week? Speaker A: I think it was within that lecture. I think like in the middle of that talk, I was like, yep, this makes sense. This is 100% it. Whatever I was doing is wrong. This is the right way. I want to do everything I can to work on this topic. And I had feelings like this.
I think every researcher has a feeling like this every now and then where they just see something and it clicks. And at that point, it's a very powerful feeling. You kind of want to drop everything you've been doing and you just feel like you found something that is much closer to truth than what you thought before. I think we all had a similar moment afterwards when we're working on combining large language models with robot learning. And that was, I think, one more of these, another of these powerful moments where like things start to click.
Speaker B: That was the Taylor Swift demo? Speaker A: Is that the one? Speaker B: That's right. Yeah, maybe you could tell that story because that is a fascinating one. Speaker A: Yeah, so what happened afterwards is I started working with Sergey. There was a few of us who were really getting into that field. It kind of felt like we were a small group of renegades within the robotics community because there were a lot of problems with deep learning at the time. And some of these problems remain. It was not interpretable.
There was no way of really making it modular. It seemed like nobody fully understood how it worked. It wasn't very sample efficient and there were all of these problems. So at that time it was extremely unpopular to write any deep learning in robotics paper. And it was kind of like between these two worlds of machine learning world where any robotics paper seemed kind of like the wrong paper for that venue and the robotics world that never fully embraced or didn't embrace deep learning at that time. But it was still really cool to work with a small group of people and push these methods forward.
So then afterwards, the only place that was really embracing that view was Google Brain. So I decided to join Google Brain right out of my PhD, continue working with Sergey and others on those set of methods with the basic premise being that the way to really get it to work is to scale it up. So we're scaling it up, we're trying to figure out how to really get it to work at scale. But again, it started feeling like there is so much complexity in the world that if robots need to— the only way for robots to learn all of this complexity is to experience all of it firsthand.
It'll be very difficult to scale. It'll be very difficult to have them learn about logic and how to break down a task if they had to do it all by themselves. So then I think around 2022, I would need to look back to see what year it was exactly. We started combining it together with Brian Diechter, my other co-founder, and we started combining these robotic methods with large language models. This was before the ChatGPT moment. This is where people just started experimenting with large language models and kind of understanding them better and better.
There was this one particular demo. So we were really excited about this because there was one— we thought that this would be a path to bringing in a lot of prior knowledge that robots didn't experience firsthand that we learned from the internet into the robotics world. And that would kind of solve this problem of you having to collect so much data to understand how the world works, because that understanding is already embedded in large language models. So if you figure out a way to combine these two, that should really, really help.
And we had this one particular experiment where we were testing how much transfer do you get from this internet-scale knowledge to robotic behavior. Usually when you work on robots, you work on this very specific task and then you test that specific task. So you're very rarely surprised. It's mostly disappointing because you really want this task to work. You've been working on this task for a very long time and then it usually doesn't work. But if you work hard enough, eventually you get it to work. But very rarely you are in a situation where you didn't expect the task to work or it's not the task that you're working on and it works.
So very rarely you're positively surprised. So we set up a few of these experiments trying to test how much knowledge transfers. And one of them was this experiment with a Coke can in front of a robot and 3 pictures of different celebrities in front of it. And the prompt we gave to the model we built that combined LLM and robot models was put a Coke can on picture of Taylor Swift. And one of the pictures was Taylor Swift. And it's, you can see this video on the internet. It's a pretty pathetic demonstration of what robots could do at the time, but the robot picked it up and then slowly moved it towards Taylor Swift.
And that was another one of these moments of huge, huge excitement, even though if you watch the video, it's totally unimpressive because this was the robot models had never had the chance or had never had any of Taylor Swift in their data. It had to understand the concept of Taylor Swift connected to the image of Taylor Swift and then connected to the right motion that would move Coke and shoot the picture of Taylor Swift, all from internet data. So that was the moment where it clicked for us that it would actually work, where you can bring in a lot of prior knowledge from LLMs, from the internet, and connect it to robot motions.
And even though the demonstration itself was very unimpressive, just the fact that you can combine these two knowledge sources, that was really, really, really impressive for us. And it felt like it opened another door. Speaker B: That was the Taylor Swift demo? Speaker A: Is that the one? Speaker B: That's right. Yeah, maybe you could tell that story because that is a fascinating one. Speaker A: Yeah, so what happened afterwards is I started working with Sergey. There was a few of us who were really getting into that field. It kind of felt like we were a small group of renegades within the robotics community because there were a lot of problems with deep learning at the time.
And some of these problems remain. It was not interpretable. There was no way of really making it modular. It seemed like nobody fully understood how it worked. It wasn't very sample efficient and there were all of these problems. So at that time it was extremely unpopular to write any deep learning in robotics paper. And it was kind of like between these two worlds of machine learning world where any robotics paper seemed kind of like the wrong paper for that venue and the robotics world that never fully embraced or didn't embrace deep learning at that time.
But it was still really cool to work with a small group of people and push these methods forward. So then afterwards, the only place that was really embracing that view was Google Brain. So I decided to join Google Brain right out of my PhD, continue working with Sergey and others on those set of methods with the basic premise being that the way to really get it to work is to scale it up. So we're scaling it up, we're trying to figure out how to really get it to work at scale.
But again, it started feeling like there is so much complexity in the world that if robots need to— the only way for robots to learn all of this complexity is to experience all of it firsthand. It'll be very difficult to scale. It'll be very difficult to have them learn about logic and how to break down a task if they had to do it all by themselves. So then I think around 2022, I would need to look back to see what year it was exactly. We started combining it together with Brian Diechter, my other co-founder, and we started combining these robotic methods with large language models.
This was before the ChatGPT moment. This is where people just started experimenting with large language models and kind of understanding them better and better. There was this one particular demo. So we were really excited about this because there was one— we thought that this would be a path to bringing in a lot of prior knowledge that robots didn't experience firsthand that we learned from the internet into the robotics world. And that would kind of solve this problem of you having to collect so much data to understand how the world works, because that understanding is already embedded in large language models.
So if you figure out a way to combine these two, that should really, really help. And we had this one particular experiment where we were testing how much transfer do you get from this internet-scale knowledge to robotic behavior. Usually when you work on robots, you work on this very specific task and then you test that specific task. So you're very rarely surprised. It's mostly disappointing because you really want this task to work. You've been working on this task for a very long time and then it usually doesn't work. But if you work hard enough, eventually you get it to work.
But very rarely you are in a situation where you didn't expect the task to work or it's not the task that you're working on and it works. So very rarely you're positively surprised. So we set up a few of these experiments trying to test how much knowledge transfers. And one of them was this experiment with a Coke can in front of a robot and 3 pictures of different celebrities in front of it. And the prompt we gave to the model we built that combined LLM and robot models was put a Coke can on picture of Taylor Swift.
And one of the pictures was Taylor Swift. And it's, you can see this video on the internet. It's a pretty pathetic demonstration of what robots could do at the time, but the robot picked it up and then slowly moved it towards Taylor Swift. And that was another one of these moments of huge, huge excitement, even though if you watch the video, it's totally unimpressive because this was the robot models had never had the chance or had never had any of Taylor Swift in their data. It had to understand the concept of Taylor Swift connected to the image of Taylor Swift and then connected to the right motion that would move Coke and shoot the picture of Taylor Swift, all from internet data.
So that was the moment where it clicked for us that it would actually work, where you can bring in a lot of prior knowledge from LLMs, from the internet, and connect it to robot motions. And even though the demonstration itself was very unimpressive, just the fact that you can combine these two knowledge sources, that was really, really, really impressive for us. And it felt like it opened another door. Speaker B: Yeah, it feels like, I don't know if you're familiar, I'm sure you are. I can't believe I'm asking if you're familiar, you know much more about this than I do.
But there was some experiment in like the '70s, I think called like ShardLU or something like that, where the entire approach was to try and teach AI all the rules of the world programmatically. And to say, you know, here is the category of what a bird is. It has these characteristics. And then here's the category of a bat. It has some overlapping character, you know, all of these sorts of things. But essentially, people are familiar with this concept now because of the way we use LLMs, but it was almost as if the robot suddenly inherited all of these rules and pieces of knowledge from tying them up with the LLMs that allows the Taylor Swift demo to happen.
Speaker A: Yeah, that's exactly right. And I think there's maybe two points there. One is that we made the same mistake in robotics. We wanted to write all of these rules. We thought that if only we had enough of these rules, the robots would be able to follow them and do the right thing, but kind of like we said about the inner game of tennis, you can't just write all of the rules, you kind of have to do it. And there is some underlying structure, but you can't just fully put your finger on it of what it is.
You need to learn it from data. And I think people thought about it similarly in language as well. They thought that if only we could write all of the rules, that would be enough, but it turns out that there is trillions or billions of these rules and sometimes we can't fully even express them in language. You just need to learn them and if you do, then you would be able to follow them even though you still don't understand fully what they are. So I think we're learning this lesson over and over again.
Speaker B: And so, you have this almost third eureka moment for yourself where the technology has taken yet another jump. By that point, it sounds like you have most of the, the Physical Intelligence co-founders around the table, but how did you sort of pull the, the last few members aboard and decide to make that leap? Speaker A: Yeah, I think at that point it started becoming clear that it could be possible. And, you know, if you worked on something for so long, at that point, that was basically my entire adult life.
15 years or so. And for a long time you thought that there was no solution to this problem. There were some times where it felt a little bit more tangible, but it never felt like it could actually be solved. And then you have this moment where for the first time you see the light at the end of the tunnel. Like maybe, just maybe, if we do everything right, if we combine it with internet knowledge, if we scale it up, if we, if we do all of the pieces that need to be done, it might work.
If you see that light at the end of the tunnel, you can't unsee it. You want to do something about it. And at that point, it became clear that the way to accomplish this is to create an organization whose sole purpose is to solve physical intelligence, is to solve this problem. It can't be solved as like priority number 20 in another organization. The only reason to exist for this organization would be to solve physical intelligence. So then the next question is how do we actually do it and what are the conditions to make it happen?
And what became clear immediately is you need to have truly the best people, truly the best researchers to make it happen because the second best team in those areas doesn't really work with a company like that. You need to have the top, top people in the field. So that one actually wasn't that hard because we were already working with each other for a very long time. So I happened to be friends with the best people in the field already. So it was just a matter of making sure that we all want to do it and we're ready for it.
The other condition was that you need to get access to a lot of funding. It would require very long, very long-term bet and investors that are fully aligned with this, taking some time and this starting as a research company. And not being oriented around revenue or around the short-term revenue. So it was the second requirement. And the third requirement was just like build a very, very incredible, build an incredible company with the right focus, with the right people, with the expertise in all the other areas that we didn't have expertise in, like hardware operations and things like this.
I spent most of my time figuring out these two points. How do we, how can we find the right investors that can that can really help us in this, in this adventure and they're fully aligned with how we want to do this. And then how can we find the right people to, to fill all the missing pieces? Speaker B: How did Lockie Groom join the crew? Because he's obviously had a very impressive career as an operator, as an investor, but isn't someone who one would automatically think, you know, this is someone who's going to devote the next chapter of their life to robotics.
Speaker A: So I didn't know Lachie before we started thinking about starting a company, but then, and he should tell this story rather than me, but I believe what was happening is that he's been investing in that point for a few years. And as long as he's been investing, he always thought that he doesn't want to be investing forever. He wants to find something else that he can fully devote himself to and he wants to build things. And the one area he was always fascinated by was robotics. And he was seeing over the past few years that there was a lot of innovations happening there.
There was some kind of moment, robotics was having its moment. And particularly he was impressed, he was seeing over and over papers coming from Chelsea Finn and Sergey Levin's lab, as well as our papers from Google. So he talked to a lot of a lot of his friends that if ever Chelsea, Sergey, or Karl or anybody from those teams are thinking of starting a company, please connect me to them. I would at least want to invest or at least talk to them. So that's when we got introduced by a friend and we pitched to Locky as a way to have him invest.
And at the end of that pitch, it became clear that he would like to do much more than just invest. This is the opportunity he's been looking for for all the years he's been investing. And she really wanted to dive in. So then the next step was us trying to get to know each other as quickly as possible, spend as much time together as we could. And then as we were doing that, it also became clear that this is the person that I've been looking for that could help us on the commercial side, on fundraising side, on operations side, kind of one of the missing links that we really needed.
Speaker B: Were there certain philosophical alignments that you needed to find in someone who you hadn't worked with before? What were the sort of, you know, even beyond maybe the tactical pieces, the traits that you saw in him that made you feel like, yes, this is someone I can bring into this very trusted group where, you know, we've already built all this history together? Speaker A: There were a few. I think maybe the The biggest one was that he immediately got it. At that point, there were a lot of investors or other people that I talked to, other business people that I talked to.
And it was very hard to get that idea across, that idea that you have to do research right. You need to get, you need to build a technology first and you can't be distracted by short-term revenue. And if we do do this right, this is going to completely change the world and it's going to be most valuable business of all time, but you need to have the patience to let it, let us do it the right way rather than short circuit that path and kind of cap the ceiling. And he got it, I think, within like the first minute.
So that was very reassuring. And then on top of that, what I really liked about our conversations was that in many of those previous conversations with other people, I always felt that like I'm the one pushing the ambition. Of this project, of how far it could go if we set it up right. And I think this was the first time where I had somebody else set up even higher ambitions for us. That was really cool to see because it felt like we're all pushing in the same direction. He will make us even better and more ambitious.
So I was really glad to find someone who will, you know, make sure that we are not gonna, that we're not gonna short circuit this. He's fully aligned in doing this in the biggest way possible. Makes us more ambitious and he's the person in the areas that we really need more expertise. Um, and he'll make sure that those areas are fully aligned with how we want to develop this company. So it was, it was basically a no-brainer at that point. Speaker B: You had this insight that this was the right moment to, to go and solve this problem, but at least from the outside, one could imagine many different form factors that might take, like another version of this company might be saying, hey, we're going to try and build the best humanoid robot ourselves, or, you know, we're going to take this approach to getting the data we need to run these models very effectively versus another approach.
How did you sort of land on the version of physical intelligence as it is today? Speaker A: I think from the get-go, we had an idea of, we had the thesis for the company, and that thesis was around all the research that we've done up until this point. That similarly to what happened with language, it's not going to be specialist models that are really going to solve this problem. It's going to be generalist models that work across all kinds of different tasks, all kinds of different environments, and all kinds of different robots.
And it was similar to what we've seen in language where you would think that the best translator would be just specialized for translation or the best coder would be just specialized or just trained on code. But it turned out that the way to be the best at all of those fields is to train one generalist and generalist that takes in poetry data and coding data and translation data. And then it turns out to be much better than all of those specialists at those specialist tasks. And we started seeing something similar in robot learning, in robotics, that if only we could collect enough data, if that data was very, very diverse and if we do it the right way, we should be able to build the best generalists.
And if we solve that, that is really gonna allow us to have this world of many diverse form factors and robots that can be very intelligent. So I think the kind of the two thoughts that we had from the beginning is that one is intelligence has always been the bottleneck for robotics. Rather than trying to start a robotics company that focuses on a specific robot, how can we tackle this problem head-on and just focus on the intelligence? And then the second thought being that The way to solve intelligence is to take the lessons that we learned from vision and language and other fields and really take the foundation model approach, which includes things like cross-embodiment learning, large diversity of data, a lot of real-world data, and do the research necessary to figure out how to build these models.
Speaker B: So you sort of land on this idea of the AI brain for these different robot types, to put it sort of simply. How did you start to think about the right way to gather the data? Because that's clearly such a big piece of it, and obviously there are different pieces, different players that take different approaches there. So I imagine you must have had to reason through that in a million different ways to land on the one you have. Speaker A: I think we're still reasoning through it. I don't think we have all the answers yet.
I think the important piece about how how to think about physical intelligence is we are not very dogmatic. It's not that we sit down and think very, very hard and then come up with the solution and this is our bet. I think the, the better way to think about it is that we are really truth-seeking and we run experiments and we don't know the solution and, and we know that we don't know the solution. So we want to follow the scientific method and really try to find the truth and really try to find what works and what doesn't, because I think there is a true answer out there.
We just need to be very humble, um, in, in finding it. So that's, I think, how we arrive at the current set of answers. We run a lot of experiments, we try many different ideas, and we see which ones stick, and then we, we double down on them. Now, in terms of, based on what we've seen, based on the evidence we've, we've seen, how we think it's going to work, or how I think it's going to work, These models will need to be able to absorb very diverse datasets where it's less about, you know, picking the right data or picking the right way of collecting data, more about building the engine that allows you to absorb all kinds of data, whether it's videos of people, whether it's teleoperation data from robots, data from handheld devices, video data really, or simulation data, really anything.
And the more data they can absorb, the better the model will be. And I think we're kind of at this stage of robot learning where we kind of try to throw anything we can at these models, build them in a way that they can absorb as much of it as possible and get them to the threshold of being able to be deployable, where you can actually deploy them in the world and have robots out there collecting data for real, doing economically valuable tasks. And I think once you're at that stage, once you're at that threshold of they can actually work and do valuable, valuable tasks, that's when you enter this next stage of now deploying robots at scale and deploying it in multiple verticals in diverse environments and actually delivering value.
And I think that second stage is actually going to be the, the stage where we get most data from, where the more robots you deploy, the better the models should get, the more you can deploy them. And there's a natural flywheel. To it. And what's exciting about the moment today, this moment right now, is that I believe we're very close to this threshold. We're already above that threshold. And that's something that is really, really exciting because I think the models will keep on getting better. They'll be able to absorb more and more data, but they'll also have this sustainable source of data that is very, very valuable because that's the data that is the most real, that is the, the closest to how you actually want to deploy the robots.
Speaker B: For folks that maybe haven't spent as much time digging into this or, or sort of coming to it fresh, I would say that one of the experiments, as you think about this data collection, that Physical Intelligence has done maybe more than others is this real-world data approach. Why is that so important and so valuable to, to get and, you know, so valuable to enter that phase 2 that you talked about that can sort of start the, the flywheel? Speaker A: We have a few theories why this is really important, but I think maybe the meta point here is that if there was a different path that worked much better, like maybe through simulation or from videos or something else, we would happily pick that path.
So again, it's not that, you know, we sat down and thought that this is the best path and this is the only thing we're going to do. We run a lot of experiments, we tried out a lot of ideas, and this is the one that seems to be working very, very well. Why do we think that this is important? I think it's kind of difficult to describe it in the absolute. It's, I think, a little bit easier to compare it to other alternatives. So one popular alternative is simulation. And this is in particular an area that had a lot of success, especially in locomotion use cases where you have robots walking around or doing stunts like backflips and things like this.
Most of these methods are training simulation first. And I think for those kind of tasks, the main complexity is about how you move your own body. It's less about interacting with the world and more about how do I move my legs correctly so that I can, I can walk or I can run. And in that sense, as long as you model your own body accurately, you should be good. If you model it very, very well, your own particular robot, and that transfers from simulation to the real world, you should be able to learn that behavior in simulation and then that's good enough to work in the real world.
Now when it comes to the problem of manipulation where you're manipulating the world around you, the difficulty is less about modeling your own body, like how you move your arm from A to B, but it's more about modeling how the world will react to it. So the problem of manipulation is more about this interaction with the world that you have. Or the interaction of the world that you're interacting with. And I think in this case, it's just much harder to simulate the world around you than it is to simulate your own body.
It's no longer about just a single robot that you need to simulate, you need to simulate everything. And we don't really know how to simulate everything at scale, how to do this accurately enough and scalably enough so that it would work for anything. For every single one of tasks or objects, it takes really long to get it exactly right, to get all the friction parameters right, to get the simulation behaviors right, and it's just not as scalable. That's been our finding so far. Speaker B: And so it's sort of the case to try and distill that, that, you know, with simulation you can do some of these, I don't know, maybe this isn't the perfect word, but sort of coarser, larger actions that are more self-contained.
But once you start trying to, yeah, implement picking up the coffee cup or the towel or whatever it might be, you're starting to rely on a simulation that would have to be so good you'd effectively have to, you know, create a simulation that is perfectly similar to reality. And that's where the real-world data starts to become so important. Speaker A: Yeah, you would need to simulate all of the external world. Yeah. To do any one of those tasks. And that's just too costly, too difficult to do. But as long as you can get away with just simulating your own body, I think that works perfectly fine.
And that's what we've seen in locomotion or, you know, backflips or dances or things like that. Speaker B: There was a moment, I think about a year and a little bit ago where you talked about reinforcement learning making a comeback. And that has since become a really interesting part of the way that Physical Intelligence seems to work with ReCap. What were you seeing at the time that made you think this technique might have sort of a second or additional wind here? And why has that been so useful for you? Speaker A: Yeah, there's been a long history of applying reinforcement learning to robots.
And the problem of reinforcement learning is such that you need to be learning from your own experience. And while gathering that experience, you need to encounter some successes and then you want to increase the probability of good actions that lead to those successes and decrease probability of the actions that lead to failures. What this means is that you need to have, as you explore the world, as you collect your own experiences, you need to have some of these successes because if you don't see any of them, you basically don't really know where to go.
You're kind of lost. And if you start from scratch, if you just try to command random commands to a robot, it's very, very unlikely that you'll encounter a success based on that. So let's say you're trying to teach a robot how to grasp an object and you command basically random commands to all of the motors of the arms. It's very, very unlikely that some of these commands are going to lead to a single success of you grasping an object successfully. And that's what we often refer to as the exploration problem.
We don't really have a way of guiding the robot on how to explore so that you can encounter a success. And as soon as you encounter it, then you're on the flywheel. You can now increase the probabilities of those actions. It's just like to get to the very first one, very, very hard. And that's been the problem of reinforcement learning for a very long time. But now with the models that we've been putting together, this exploration problem became much easier because robots now don't start from scratch. They start from a foundation model like PiZero, Pi05, or Pi06, where they already have some intuitive understanding of how motions work and how you can explore the world around you.
So even if you put a new object in front of a robot, the chances are that if you ask it to grasp it, it will either grasp it or it will fail in an interesting way. So the probability of you actually doing something useful starts to become very, very high, or at least much higher than it was before. So now you have a way to explore the world and that allows you to apply reinforcement learning methods much more easily because they can encounter successes much quicker. So this is kind of how reinforcement learning has evolved over time.
And that's what allowed us to actually kind of start or start applying it again and seeing if we can see the successes of it. Now, from the other perspective, the reason why we thought reinforcement learning would be important is that the way most of these models are trained today is mostly based on imitation. So you collect a lot of data, whether it's in simulation or from human videos or from real teleoperation data. And then the objective for these models is try to replicate the actions that you've seen in the dataset.
And the important caveat there is that it's not actually the objective that we care about. We don't care about executing every single action exactly the same as it was presented to you. What you really care about is the success of the task, whether you accomplish the task or not. And we don't really have a way of codifying this objective with these models. They're all optimized for imitation and therefore it's very difficult to drive the success rate of those models because it's not the objective that they care about. They only care about imitating the actions as closely as possible.
Reinforcement learning provides you a way to basically codify this objective, to have them now be optimizing for actually accomplishing the task successfully. Both of these things combined led us to believe that we should start applying reinforcement learning so that we can improve reliability of these methods and have models that work not 70% of the time, but 99.99% of the time because they actually optimize for the right objective. And at the same time, it became possible because now these models can explore in ways that are more intelligent than it was possible before.
Speaker A: Yeah, there's been a long history of applying reinforcement learning to robots. And the problem of reinforcement learning is such that you need to be learning from your own experience. And while gathering that experience, you need to encounter some successes and then you want to increase the probability of good actions that lead to those successes and decrease probability of the actions that lead to failures. What this means is that you need to have, as you explore the world, as you collect your own experiences, you need to have some of these successes because if you don't see any of them, you basically don't really know where to go.
You're kind of lost. And if you start from scratch, if you just try to command random commands to a robot, it's very, very unlikely that you'll encounter a success based on that. So let's say you're trying to teach a robot how to grasp an object and you command basically random commands to all of the motors of the arms. It's very, very unlikely that some of these commands are going to lead to a single success of you grasping an object successfully. And that's what we often refer to as the exploration problem.
We don't really have a way of guiding the robot on how to explore so that you can encounter a success. And as soon as you encounter it, then you're on the flywheel. You can now increase the probabilities of those actions. It's just like to get to the very first one, very, very hard. And that's been the problem of reinforcement learning for a very long time. But now with the models that we've been putting together, this exploration problem became much easier because robots now don't start from scratch. They start from a foundation model like PiZero, Pi05, or Pi06, where they already have some intuitive understanding of how motions work and how you can explore the world around you.
So even if you put a new object in front of a robot, the chances are that if you ask it to grasp it, it will either grasp it or it will fail in an interesting way. So the probability of you actually doing something useful starts to become very, very high, or at least much higher than it was before. So now you have a way to explore the world and that allows you to apply reinforcement learning methods much more easily because they can encounter successes much quicker. So this is kind of how reinforcement learning has evolved over time.
And that's what allowed us to actually kind of start or start applying it again and seeing if we can see the successes of it. Now, from the other perspective, the reason why we thought reinforcement learning would be important is that the way most of these models are trained today is mostly based on imitation. So you collect a lot of data, whether it's in simulation or from human videos or from real teleoperation data. And then the objective for these models is try to replicate the actions that you've seen in the dataset.
And the important caveat there is that it's not actually the objective that we care about. We don't care about executing every single action exactly the same as it was presented to you. What you really care about is the success of the task, whether you accomplish the task or not. And we don't really have a way of codifying this objective with these models. They're all optimized for imitation and therefore it's very difficult to drive the success rate of those models because it's not the objective that they care about. They only care about imitating the actions as closely as possible.
Reinforcement learning provides you a way to basically codify this objective, to have them now be optimizing for actually accomplishing the task successfully. Both of these things combined led us to believe that we should start applying reinforcement learning so that we can improve reliability of these methods and have models that work not 70% of the time, but 99.99% of the time because they actually optimize for the right objective. And at the same time, it became possible because now these models can explore in ways that are more intelligent than it was possible before.
Speaker B: To circle back to a topic you alluded to earlier, you mentioned that when you first started chatting with Lockhi, he understood the need for patience on commercialization and you sort of referenced the fact that there's a version of this company that if you're too impatient, maybe you can short circuit the sort of long-term vision. What are the ways that that happens and how do you have to be alert to them? Speaker A: I think there's a long history of robotics companies doing this or this happening to them. We're not the first company that starts with a big vision and broad vision for robotics.
But if I analyze what happened in the past, what often happens is you start with that vision, you start developing the technology, and then because there's usually some external pressure, you try to commercialize this technology at that point when it's, when it's not fully ready yet. And you dive into a particular application, maybe the application that has the biggest TAM or, or something like that. As soon as you do that, you start cutting corners on the technology itself and you short circuit that vision of it being a very generalist, general purpose technology that would actually deliver the most value.
But now you're just trying to deliver the most value for that customer or for that particular vertical that you chose. And that's perfectly reasonable, right? Like all of your incentives are tied to that. Your revenue is tied to that. The customer satisfaction is tied to that. Your valuation, you know, how employees think about this. So you have all incentives in the world to start cutting corners and trying to, and trying to make a more special purpose solution. That's usually what happens. So you start with this, broad vision of how general purpose robots are going to be solved, but you very quickly end up becoming an application company, like a warehouse pick and place company.
And this would be a heartbreaking outcome for me. I think we really have a chance to solve the big problem, the problem of general physical intelligence. And if we do that, you won't just enable that one application that you could have focused on initially, But I believe you would be able to, to, to solve all of it. You would be able to solve it for any robot to do any task. And counterintuitively, that would be the much, much better commercial outcome as well. Speaker B: It's all just about trying to have the, the right time span in your head.
How does that influence how you think about the right partners for the business? Like, are you optimizing for, I don't know, a range of use cases in that case so that you're getting the diversity of data? Is it about scale and getting as much of that data from in-person deployments as you, not in-person, but real-world deployments as you can? Speaker A: Right now we're optimizing for the rate of learning. So it's still quite early, unlike maybe what you can see on Twitter today. It's not that robots are about to knock at your door and show up at your home and do everything.
I don't think everything is solved yet and it's just a matter of scaling up the existing recipes. I think we can push them much, much farther than where they are today, but there's still a lot of research to be done and a lot of questions that are unanswered. So right now we're mostly optimizing for speed and how we can learn as much about the problem as possible. A lot of it does come down to what kind of data to collect, how to integrate the data into the model, and how to have that loop be as tightly closed as possible.
It's a long way of saying that it's a fairly nuanced question. But what we're optimizing for is learn as much as possible so that we can figure out the scalable recipe that then we can just scale as much as possible. Speaker B: Do you have theses at this point on what gives you the steepest rate of learning, whether that's, I don't know, very fine-tuned tasks or something totally different? Speaker B: Do you have theses at this point on what gives you the steepest rate of learning, whether that's, I don't know, very fine-tuned tasks or something totally different?
Speaker A: There's a few of those things that we, that we learned. We know that the diversity of data is really, really important. We know that the quality of data is really important. We know that closing the loop with the models are very important. I just feel like these are fairly broad statements that, you know, if you had asked me this a year ago, I would probably say something similar, but we did learn a lot about what these terms actually mean. I think, you know, very often people talk about the diversity of data or quality of data or something like that.
But we don't really have very good definitions of what quality of data actually is. What does it mean? Or what diversity of data actually means? Or how do you measure it? So I think these are like fairly deep questions and we are getting more and more of a feel of what it actually is and how we can measure it and how we can optimize for it. But it's a constant push and pull on how much do you scale the existing thing? How much can you better understand it? How much can you increase the slope of improvement there?
So I think we'll just continue on the trajectory as we scale it. Speaker B: You mentioned that, you know, there's still a need for real research in this space to achieve the ultimate end goal, but that there's also plenty to optimize, you know, in the sort of thought experiment where we get no new research, How far do you think that sort of takes us? Do you have some rough heuristics of what kind of a robot that is able to produce? Speaker B: You mentioned that, you know, there's still a need for real research in this space to achieve the ultimate end goal, but that there's also plenty to optimize, you know, in the sort of thought experiment where we get no new research, How far do you think that sort of takes us?
Do you have some rough heuristics of what kind of a robot that is able to produce? Speaker A: It's a question that we think about a lot because I think one powerful insight from the language model work is that for a long time people thought that we still have a few ideas that are missing or we need a different architecture or we need different this or that. But it turned out that it was good enough. It was just a matter of scaling and we just didn't have enough foresight to predict how scaling is gonna resolve some of the issues that seem like big issues of, of today at small scale.
So I think there is a non-trivial chance that the recipe is already there. Um, that the recipe that we have today, um, would work and would just solve everything. Um, and we just need to scale, scale it in the right way and execute on it really, really well. Wow. I'm not certain about this at all, but I think there is, that's one question that is, that is an important question that we're thinking about quite a bit. So what we're trying to do is verify it. So start scaling the existing recipe and see how it performs, how it scales.
And at the same time, continue to do research that at the very least could improve the slope of that scale. Speaker B: NVIDIA and, you know, is probably the best example of a company that is putting a ton of money into improving its sort of simulation engines and getting that to higher and higher fidelity. How optimistic are you about that improving at a sufficiently high rate that it really, I don't know, closes the gap in some meaningful way or allows you to get the kind of data at scale that could improve things meaningfully?
Speaker B: NVIDIA and, you know, is probably the best example of a company that is putting a ton of money into improving its sort of simulation engines and getting that to higher and higher fidelity. How optimistic are you about that improving at a sufficiently high rate that it really, I don't know, closes the gap in some meaningful way or allows you to get the kind of data at scale that could improve things meaningfully? Speaker A: So as I mentioned before, I'm quite open-minded about these things. I'm not very dogmatic about the path it's gonna, it's gonna take to get there.
I think simulation over the past, you know, decade plus has been improving like crazy. The simulation that we see today is much more realistic than what we've seen in the past, and it's becoming more and more scalable. I believe that the first place where we're gonna see the impact of it is not going to be as much data collection as it's going to be evaluation. And one thing we see here at, at, at Physical Intelligence is that As these models become more powerful, it takes more and more time to evaluate them.
So now not only need to evaluate them for longer, you need to take more samples to distinguish between the model being at 99% success rate versus 95% rate, the success rate that it was between 50% versus 70%. You just need more samples to be statistically significant, but the repertoire becomes broader and broader. Scenario, you need to evaluate them on more tasks across more robots in more environments. And that trend, I think, will continue as the models get stronger. You'll need to evaluate them for much longer in much more sophisticated ways.
So I think if simulation starts to work in the way that we all hope it will, I think that's where it will— we'll see the first impact of it, where we can start evaluating these models in simulation across diverse set of scenarios. And if they're realistic enough, that should correlate with, with real-world evaluations. Speaker B: You know, we talked about some of the real inflection points for you over the course of your career. I'm sure that physical intelligence's biggest inflection points are, are to come, but over the past couple of years so far, what have been the, maybe the biggest surprises or the, the moments that have impressed you the most?
Speaker A: I think the, the main thing is just how fast all of this has been moving. And how much it's picking up speed as we go. So before we started the company, we thought that it was gonna take us something like 5 years maybe to, to start deploying robots. And we deployed our first robots, I think 18 months in into the company. So it either means that I'm very bad at, at making those predictions or, um, it's been just moving much, much faster. There's been multiple of these moments. I think, um, our first model or our first release Pi Zero was definitely one of those moments where I thought, it would take a very long time before we have robots folding laundry, especially diverse pieces of laundry fully autonomously.
This was one of these holy grail of robotics that I thought we're gonna, it's gonna take many, many years before we get to, and we got there within the first 7 months of the company. I think the LAX model was another one of these moments where I thought it was gonna take very, very long to get robots to perform in an environment that they've never seen before. And with our Pi05 release, we got robots to new homes that they've never seen before, and home is the most diverse environment you can imagine.
So it's the hardest version of that challenge. And that started working as well. Then with PiStar06, we started seeing that these robots can perform tasks at very high clip, at very high success rates. And now we have videos of robots doing extremely dexterous tasks like making coffee on the proper espresso machine for like 13 hours without any cuts, just like 13 hours straight. The robot is doing coffee over and over again and cleaning after itself. So I think all of this is just moving very, very fast and every single release so far has been quite surprising to me.
I thought it was gonna take longer or that the problem is much more difficult than I, than I expected. But I think that's maybe like one thing about working with these models. I think probably, um, people working on LLMs would have similar answers that they thought it was gonna take much longer to get to a certain level of capability. Speaker B: You know, when you're running a robot for 13 hours straight making espresso, what does a good failure rate look like in that case? Or what are you excited by when you see that happening, if anything?
And then, you know, something I was talking about with a friend who's in this industry is that a lot of the times the failure for these kinds of robots is maybe less obvious and more like it almost doesn't realize it's stuck in some way and doesn't find a way out. Of being stuck in that loop. Is that sort of something that you see a fair amount in this case? Speaker A: I think in general, people don't pay enough attention to this problem of reliability. And that's maybe a big difference between something like a language model or a chatbot and a robot.
When you talk to a chatbot and it makes a small mistake or it just like puts not quite the right word, it's totally fine because you're right there and you can absorb all of this, you can correct it. You can ask it again, or you can interpret it the right way. You're very forgiving. And the physical world is the opposite of that. It's not very forgiving. If you make any small mistake, if you grab the portafilter just like a little bit off, or you don't insert it just quite right, then the whole task is going to fail in a catastrophic way.
So I think the bar on performance is just much, much higher. Because there isn't someone to help you, someone to catch you. And I think this is a very big problem. Now, the other thing you point out is more about the model being self-aware of where it's not going well. And the power of reinforcement learning is that it very often needs to have an additional function that tries to predict how close you are to success. We call this a value function. And that value function should have an understanding of how well you're doing so that then you can you can improve upon it.
And what we found so far is that you can train these value functions to be pretty good. You can train them to the extent where we don't fully know that the robot is about to fail yet, that everything is looking okay, but the value function already knows. And they start seeing that the expectation of the success is starting to go down. This is actually something that we've also seen with works like AlphaGo, where we had a computer program learn how to play Go, where if you watch some of the games that this AI system had against Lee Sedol, the world champion at Go, what was quite intriguing is that in many of these games, you have world's experts at Go talking about and commentating on the game.
They're all thinking that the game is basically head-to-head. It's like, you know, it's very unclear who is winning, whether it's the AI or Lee Sedol. But then you look at the value function predictions and it's basically over. Like the AI system knows 100% that the game is over. It's already won by AI and Lee Sedol has no chance. But the entire world, you know, all the experts, including probably Lee Sedol himself, think that this is a very, very close game. So I think we will probably get to similar models in robots where they will be able to predict how well they're doing or how likely the success is much better than we will.
Speaker B: One of my favorite activities to do with my, my son is he loves to look at my bookcase and just pull out whatever covers he finds interesting. He's, you know, 15 months old, so he's just doing it based on, you know, what interesting face is on the front or whatever. And he's gotten really interested in Oliver Sacks' biography called On the Move, because I don't know, it's a picture of Oliver Sacks. It's, you know, engaging. And I was just paging through it with him like last week, and I was already starting to think quite a lot about our conversation.
And it lands on this page where Oliver Sacks talks about being obsessed with proprioception and how this is, you know, the sixth sense. And it's actually the most important sense because, you know, you can strip humans of all the other five and they can more or less get along. But if you, you know, get rid of this sense of where your body is, life becomes extremely hard. And he talks about working with these patients who, you know, maybe have a virus and they lose this, this sense of, of their own, their own body.
And he has this one case which made me think a lot about what you're doing and, and maybe what the, uh, final state of it might look like, which is this guy gets a virus and he loses the sense of proprioception, but he compensates with visuals where, you know, if the lights go off in a room, he can no longer move. But as long as he can see where he is, he can sort of walk, but not walk and talk, et cetera, et cetera. And so it's basically the way that the brain compensates for the loss of this sense.
Is that sort of the stage we're at at the moment with physical intelligence where we don't have true proprioception Um, and so we are sort of compensating with these other senses. And is there a point in the future, do you think, where however a computer is able to emulate that, we have something that really feels equivalent to it? Speaker A: Yeah, there are many neuroscientific experiments showing something like this, how you can, you can disable one sensing modality and then compensate for, compensate for it with another one. I think in general it's quite unclear what the right sensing modalities should be.
I think our intuitions are often wrong about this because also these machine learning algorithms can extract the right level of information from signals that to us seem insufficient. A common question I get is about touch sensing on our robots. And instead we put wrist cameras and it turns out that they can compensate for the lack of touch just fine. And they probably find some kind of regularities in the data where maybe you see the deformation on the gripper or something like this that indicates to you how much pressure you're applying.
And that's enough for it to fully compensate for the sense of touch. It might be that we'll find some tasks where that's not the case and you really need some other sensing modality to tell you the, the, the, to, to, to, to give you that signal. But so far, I think it's been quite surprising to the entire field how far you can push it with with just simple cameras. I think in general though, a lot about intelligence is about compensating. And what I mean by this is kind of like predicting what's going to happen and then trying to figure out if what actually happened, how different it is to what you predicted.
And it doesn't just apply to sensing modalities. I think it also applies to precision. So for instance, as a human, you're very imprecise when it comes to movements, right? Like if I ask you to put your finger on a specific point 100 times in a row and I measure this exactly, there will be quite a bit, quite a bit of variance, especially compared to modern industrial robots that can go to submillimeter accuracy and very good repeatability. But if you compare our dexterity to dexterity of those robots, You know, it's night and day.
We're way more dexterous even though we're way less precise of a machine. And I think the reason for this is that we have physical intelligence that can compensate for it. We have enough feedback that if we just look where, where our fingers are, or if we just feel it, we can compensate for it immediately. And I think we'll see that being translated to robot design as well, where maybe the robots of the future don't need to be as precise as we had thought, or, you know, they, they can accommodate for a lot of things like a little bit of backlash in the motors or just noise because with the right intelligence you can compensate for it.
And that's what we've seen here at Physical Intelligence. We show some of the most impressive demos ever on robots, but these robots are really bad robots. If you compare them to any state-of-the-art industrial machine, they're really, really bad. They're very imprecise, not very reliable. a lot of backlash and other problems, but it doesn't really matter because the right intelligence can compensate for it. Speaker B: Well, you're, that was a very interesting and gracious way of answering a very baggy question from me. So I'm glad I asked it because, uh, yeah, that was, that was fascinating.
As a final wrap-up, uh, question, I always like to ask folks that if they had the chance to assign a book to everyone on earth to read and know that they would understand it, what is a book you would, you would love to give everyone? Speaker A: I reread recently a book that I'm a big fan of. I'm not sure if this is like the book I would recommend to everyone, but this is the book I think that is really, really good. It's Why Greatness Cannot Be Planned by Ken Stanley.
Yes. Yeah, I think it's just a wonderful, counterintuitive book that shows multiple examples of how we arrive at something really spectacular without planning for it. I find it quite inspiring and quite motivational in a way that is maybe a little less straightforward than usual. So I think it's just like a very insightful book that would help a lot of people to show, shed a new light on how to think about innovation and achieving something spectacular. Speaker B: Well, that's a perfect place to end. Thank you so much, Carl. Thank you.
That's it. Thank you for listening to this episode of The Generalist Podcast. Speaker C: Please subscribe on Apple Podcasts, Spotify, or your preferred podcast app. Ratings and reviews help others discover these discussions, so if you enjoyed the conversation, I'd be grateful if you could take a moment to leave one. For all past episodes and more, visit us at com. Speaker C: Please subscribe on Apple Podcasts, Spotify, or your preferred podcast app. Ratings and reviews help others discover these discussions, so if you enjoyed the conversation, I'd be grateful if you could take a moment to leave one.
For all past episodes and more, visit us at com. Speaker B: See you next time as we continue to explore the future.
Want to learn more?
Ask about this episode