AI is already reshaping the way we work, communicate, and experience the world. Step into the intriguing world of generative AI as we explore the vast landscape of possibilities ahead.
Since the release of ChatGPT, our team has delved head first into the world of AI, building product with large language models (LLMs) and navigating the unknowns that have emerged with the recent advancements of this transformative technology.
Beyond the practical applications, however, there are countless big questions we’ve been thinking about – should we approach LLMs with a sense of caution? How big is this AI thing, truly? And what should we expect as we look into the future?
In this episode, our Senior Director of Machine Learning Fergal Reid joins Emmet Connolly, our VP of Product Design, to delve into the impact and revolutionary potential of AI – it’s a fascinating conversation that touches on a lot of the broader, existential issues raised by this incredible new tech.
Here are some of the key takeaways:
- In the pursuit of building intelligent systems, organizations are embracing techniques such as reinforcement learning to ensure alignment with our values and a positive impact on humanity.
- GPT-4 demonstrates a high level of reasoning even when tested with out-of-sample scenarios, which seems to indicate it can surpass the bar set by Alan Turing’s famous test.
- As investments rise and hardware limitations are overcome, we can expect the development of more advanced and efficient models with unprecedented adoption and productization.
- In the future, some kinds of UI may be replaced with AI agents that can personalize outputs on-the-fly based on verbal input, the task at hand, and your personal preferences.
- AI has the potential to reduce the grunt work for designers and programmers, allowing them to focus more on the solution and the vision for the product rather than on execution.
If you enjoy our discussion, check out more episodes of our podcast. You can follow on Apple Podcasts, Spotify, YouTube or grab the RSS feed in your player of choice. What follows is a lightly edited transcript of the episode.
The AI awakening
Emmet Connolly: So, Fergal, we’ve had lots of casual chats over beers and coffees and so on, and we said it might be interesting to try and sit down and have one and record it, mostly because, as we have worked directly with large language models over the last six months, we’ve been grappling with product questions that are applicable to what we’re trying to get done at work.
But there is, of course, a larger conversation about what AI means and the future of AI. We thought we would try and sit down and touch on some of the questions around this new material we’re dealing with. What are some of the financial impacts of technology? What are the things that we should be paying attention to? Let’s kick it off. First of all, do you have any overarching reflections on the last six months?
Fergal Reid: Yeah, definitely. Let’s see how this goes. I think it’s fair to say that even people who’ve worked in machine learning or AI have been caught by surprise by how fast things got better. Even for people who are experts in the field or have worked with neural networks for a long time, it’s been surprising that the model got as intelligent as it did.
Emmet: Do you think some AI folks are a bit concerned that they might be working on the Manhattan Project now of our generation? A while ago, you were working to auto-complete text, and suddenly this has become a very fraught, debated topic. How does it feel for people working on AI to be at the center of that?
“You do all your training, the model comes out, and it’s really intelligent. But you didn’t individually code that intelligence. It’s still machine learning”
Fergal: To set out my perspective, we’re not training large language models. We’re using them; we’re consumers of them. We’ve had early access to GPT-4, but we’re not training them ourselves. On the other hand, I have a team of people here who are experts on AI. A lot of us have been in AI for, I guess, decades at this point. When I was in college, I was really interested in advanced AI, reading books on the philosophy of AI, and people were debating whether it could ever do this or that. And now, we have systems that suddenly make a lot of those debates less relevant. Suddenly, there’s a system that can do this thing nobody said it could ever do.
I guess the counterpoint is that if you’re training large language models, there’s an extent to which it’s an engineering task. You do all your training, the model comes out, and it’s really intelligent. But you didn’t individually code that intelligence. It’s still machine learning. So, there’s an extent to which I think everybody is surprised by this. It’s not like people incrementally build up the capability one line of code at a time. No one’s sure what’s going to happen at the end of a big training run.
Emmet: I jokingly alluded to the Manhattan Project, but I guess it’s a pretty good analogy to some of the stuff we’re dealing with.
Fergal: In what way? Because it’s dangerous?
Emmet: Well, we’ve discovered a way to manipulate something. In this case, information. It feels more like a discovery than an invention in a sense. It’s very broadly applicable. We’re not sure what the unintended consequences of its uses are. And, of course, it could be put to use by bad actors for malicious purposes as much as good actors for positive purposes.
“We know on a technical level how these models are trained, but this is a black box situation”
Fergal: Yesterday, OpenAI released a position statement around this tech, calling for oversight of AI tech. They drew parallels to nuclear tech and biotech. I think that’s fair. It’s potentially in that category of scary technology where humans don’t know what they’re messing with, in the finest traditions of science fiction. I buy the idea that this could all go wrong and that training large language models is something people should start being careful about.
Emmet: I’d love to talk about what you feel we have discovered, and I keep saying discovered because it almost feels like a discovery in the way we’re talking about it, like, “Whoa, we’ve got this thing, and we better be careful how we handle it.” Is that how you think about it? We know on a technical level how these models are trained, but this is a black box situation – we don’t exactly understand how they produce the somewhat non-deterministic results they’re giving us.
Fergal: Yeah, I think that’s the right way to think about it. It’s a system. You start off the system and set up a training objective first, and then you run on that scale and see what happens. And over time, you get better and better at understanding what’s likely to happen, but you’re not sure. You’re sort of testing it. I think a good analogy here is like picturing a biological system, setting it out to grow for a while, and then seeing what it does. It’s closer to that. You’ve got to test it in this black box way. You’ve got to check its behavior. You don’t know what you’re going to get.
Emmet: I guess this is where the obvious question of “is this intelligent?” comes from, and this is a big question that’s preoccupied a lot of conversation. Because if it is intelligent, that means we’re on the path to AGI, and that AGI could be malign and we could be in big trouble. It seems like a worthwhile thing to wave the flag about, but it’s also driving a lot of anxiety around the technology.
Fergal: I think a degree of caution or anxiety is fair here. Let’s assume these things are getting intelligent. Intelligence is really scary and dangerous. Humans are arguably the most dangerous. We’ve had a big impact on the Earth and the ecosystems. It’s not because we’re the strongest or the fastest animal. A human can kill a lion because the human’s more intelligent. More intelligent organisms are, in a sense, often more dangerous. And so, the idea that we could end up creating something that’s more intelligent than us could be really dangerous. We have no experience with that, so I think some of the caution is totally warranted.
Emmet: I think we need to get better at thinking about different types of intelligence. A lion has some intelligence and is dangerous coupled with its capabilities, right? But this has no embodiment. I mean, it has access to computer systems that could be very damaging, but is malignancy a human trait? And why do we immediately project that potential onto this system?
Fergal: I think a thing a lot of people are saying is it doesn’t need to be malignant. It doesn’t need to be intentionally bad. It doesn’t need to be much intentional at all. All you need is to create something that is trying to optimize some objective that brings it into conflict with good things humans want, right?
“It could be sending out to do something that you think is good overall, but you could get into conflict due to its methods. And if this thing is smarter than you, how does that conflict play out?”
There’s this idea of a convergence instrumentality in early AI safety literature, and it’s this idea that if you have a goal in the world, a lot of the things you might want to do to achieve that goal could bring you into conflict with people with other goals. If you want to cure cancer, you might want a lot of money to cure cancer, and now you’re instantly in conflict with all the other people who want money. To get to a lot of goals, you need energy and resources. And so, if you end up with any system that’s goal-directed and potentially smarter than you, even if it’s not conscious, you can stumble into conflict with it. It doesn’t have to be evil. It could be sending out to do something that you think is good overall, but you could get into conflict due to its methods. And if this thing is smarter than you, how does that conflict play out?
People start talking about the “paperclip maximizer,” where you just told this thing to go and make lots and lots of paperclips because we need lots of paperclips, and then, accidentally, it went and consumed all the resources of the world and turned it into a paperclip factory. And it’s like, “Whoops.” These are all ideas in the AI safety debate for a while.
Emmet: There are also human concerns as well. It sounds like you’re describing an alignment of incentives between all the actors, the technology and the humans. And that’s what we do when we organize as groups at work. A simple example is putting the right incentives for your teams in place – otherwise, they might be incentivized to do something else. If you incentivize your sales team to sell to enterprise customers, but you actually want them to sell to smaller companies, you have to tweak the incentives. And we do have lots of experience of that.
Fergal: Look, to what extent is that because you did a really good job of balancing the incentives versus a balance of power? If you look at humans, in times when there are massive power imbalances where things go wrong, it’s very hard to keep the incentives. If you’re relying on incentives, it’s tough. As humans, we put great care and attention to having checks and balances. And so, again, back to this discussion of superintelligence, if it’s possible to build a superintelligence that suddenly becomes very powerful, are you going to rely on the incentives? Because it’s always hard to rely on the incentives to keep things going.
“In the past, we’ve always relied on the balance of power. Now, we’ve got to rely on the aligned values”
Emmet: I guess we can’t know until it reveals the nature of itself a little bit more. My personal feeling is that when we obsess about super intelligence, we’re obsessing about it getting smarter than us. And there’s some risk, I suppose, but there’s also an ego thing for humans at the center. It’s the thing that separates us from the animal kingdom. People often say AI is a bit like alien intelligence, and I think animals are a useful way of thinking about it because we have evolved to coexist peacefully with different types of intelligence. Now, I have a dog, I have a cat. The cat possesses a very specific but high degree of intelligence – athletic ability. It’s light, and its reflexes are fast. If I consider intelligence broadly, it’s very intelligent and beautiful to watch.
Fergal: I’ve got to jump here because I don’t think this is a great analogy. At least, it’s not a comforting one. I’m a pescatarian – mostly vegetarian. Factory farming is not great for the animals involved. And so, I don’t know, it doesn’t reassure me to hear that the model here is we have evolved to coexist peacefully with animals.
Emmet: What’s wrong with pets?
Fergal: No, pets are good. I mean, there’s this idea about humans being pets in the future. I think this should be uncomfortable.
Emmet: Well, you’re flipping my argument around. The point I was trying to make was that the cat has one type of intelligence. I also have a dog who has a totally different type of intelligence from the cat’s. You think you can talk to a dog, and he kind of understands and peers into your soul and all of that. But he’s also dumb as a bag of rocks on another level. I love him, but he is. Now, I guess you’re making a point of, “Emmett, you are the dog in this situation if we fast-forward.” But there’s a happy coexistence there. Hopefully, we don’t become domesticated as a species as well.
Fergal: Yeah, if it turns out that it’s possible to make something more intelligent than us, that’s the thing to shoot for, this happy coexistence where you end up with something that’s benign and cares about life in the universe and has good values. But the reason a lot of people are so exercised about this at the moment is that it feels like there’s a massive risk there. If you’re going to build something more powerful, you’ve got to make sure those values are right. In the past, we’ve always relied on the balance of power. Now, we’ve got to rely on the aligned values. If you look at OpenAI and Anthropic and the other players, they spend all this time talking about alignment for this reason. Humans are no longer going to be the most intelligent things. Intelligence is powerful and dangerous. We need to make sure it’s aligned.
Emmet: How good a job is the AI community doing of actually pursuing alignment as an end state versus lip service? Because if it all goes wrong, at least we can point to our old blog post and say, “Well, we mentioned alignment, so don’t blame us.”
“If you’re interacting with cutting-edge models, it’s pretty hard to push them into suggesting repugnant things. A lot of people in the past thought that’s what they were going to do by default”
Fergal: I think they’re doing a pretty good job. A lot of people would disagree with that, right? A lot of people would be like, “Hey, it’s totally irresponsible to just keep training larger and larger models. You don’t know what you’re going to do.” Beyond a certain point, that probably becomes true. I don’t think we’re at that point yet. If you look at AI safety folk, 10 years ago, there was always this thing that specifying an objective function is a good idea. You tell it to cure cancer, and it says, “Step one is to kill all humans. Now there’ll be no more cancer,” and that’s obviously bad. But if you play with GPT-4 and write, “What’s a good plan to cure cancer?” It doesn’t say, “Kill all the humans.” It gives you a fairly good research plan. And if you suggest to it, “What about killing all the humans?” They’ll be like, “No, that’s morally repugnant.” That’s alignment. And that’s just at the level of the text it produces.
We can get into this whole debate of, “It’s just producing text – it doesn’t mean it’s intelligent.” I have a position on that. I think it is intelligent. We can get into that whole debate, but that’s more progress on alignment than a lot of people expected. If you’re interacting with cutting-edge models, it’s pretty hard to push them into suggesting repugnant things. A lot of people in the past thought that’s what they were going to do by default. And again, OpenAI recently has come out and said they’re making progress on alignment.
Emmet: Do we know the guardrails they’re putting in that are preventing that from happening? Or is that an emergent property of the system in itself? Is it a function of training, of the source data, of something else?
Fergal: That’s a hard question. I think the answer people would give is that it’s not just to do with the source data. I guess the big breakthrough in the last few years is this sort of instruct GPT thing. You train your model on all the data on the internet and come up with something that doesn’t really follow instructions properly. Then, you put that through fine-tuning, or an alignment or instruction phase where you give it lots of examples of good and bad behavior and adjust the model weights accordingly.
Emmet: And this is the human-reinforcement learning?
Fergal: Yeah. One mechanism to do that is reinforcement learning with human feedback. There are a bunch of similar paradigms like that, but the basic idea is that you can train on lots and lots of stuff and then sort of instruction-tune afterward. That seems to be working pretty well.
“You could end up training something to be really good at appearing aligned. And then, underneath, there may be some other layer of abstraction that isn’t aligned at all. That’s the big risk people call out”
Emmet: But you didn’t actually answer my question. Do we know which part of that process is making it work well? Or are we still, “I turned some dials over here, and it seems to behave better for some reason.”
Fergal: If you don’t do the instruction tuning, it will be much less aligned. You’re like, “Hey, model, this is what good looks like.” And each time you produce something that is closer to good, you get encouraged to do that more. Each time you produce something that’s closer to bad, you get encouraged to do that less. All your weights are slightly adjusted in the direction of good. But I guess the criticism is, “You’ve no idea what the hell’s going on underneath the hood, and there are ways this could go wrong.” You could end up training something to be really good at appearing aligned. And then, underneath, there may be some other layer of abstraction that isn’t aligned at all. That’s the big risk people call out.
Other people will be like, “Well, we’re still doing gradient descent. It doesn’t get to decide anything. It’s going to be aligned.” But I think there’s a bit of a leap there. It’s not a system that you mathematically proved was going to do X, Y, and Z and built from a position of strength to strength to strength. It’s a black box system you tuned and trained.
Emmet: If I was to attempt to be uncharitable to that position, that’s a bit like stockpiling nuclear weapons and saying, “But we’ve made it really carefully, so we’re not going to push the button that makes it go off by accident.” But on a long enough timeline, and with how accessible the technology is, we surely can’t keep a lid on that. We can have lots of companies and individuals acting responsibly, but it’s going to do nothing to protect us from the worst application. What are the scenarios in which things go wrong? One of the moral arguments for working directly on this, despite the dangers associated with it, is like a totalitarian government or a secretive organization somewhere doing a bad version of this right now.
Fergal: At some point, that’ll surely happen. I don’t think we’re at this point yet. I don’t think we’re at the point where you can definitely build a superintelligence. But if we ever get to that point where it becomes obvious to people that you can build it, people and governments and militaries are going to do it. They always do because it’s potentially useful in all sorts of military applications, right? So yeah, I think that’s going to happen. The discourse here goes to things like nuclear weapons and the International Atomic Energy Agency, where there is some form of regulation. And if this is how it plays out, if we don’t get a shock, if it isn’t like, “Oh, it turns out that intelligence just peters out with the current type of training,” that could happen. If that doesn’t happen, what people talk about is tracking graphics cards and GPUs and stuff. But that has problems too. Presumably, that’ll only last for some finite period of time.
Cracking the Turing test
Emmet: Let’s go back to the intelligence thing. I know you have a hot take here. We’ve got a lot of AI skeptics or fearmongers, depending on what type. And then you have people from all across the divide: Noam Chomsky, a well-known linguist, Ted Chiang, one of my favorite sci-fi authors, who wrote this article about the blurry JPEG of the web, basically saying that this is not intelligence – it’s a stochastic parlor trick. It’s just a really good parlor trick that makes it seem really smart in the way that we view smarts.
Fergal: I have medium to high confidence that the blurry JPEG of the web take is wrong. And I’m pulling my punch a bit – I have high confidence that’s wrong. That’s this argument that all it’s doing is compressing down the web, and you’re getting some compressed version of it. And the only reason why I don’t say it’s flat-out wrong is because compressing down something can actually cause intelligence. The ability to compress things can be a measure of intelligence because just by compressing and predicting the next token, you’re predicting what’s going to happen next in the world. If it’s right, it’s right in the way it doesn’t mean.
“Although we’re having this speculative conversation, it does seem like a particularly bad time to make grand predictions about the limitations of this stuff”
If you use GPT-4, it gives you at least an intelligent-seeming output that seems to demonstrate reasoning out-of-sample. You can push it to consider something new that’s not going to be in its training data or in any sci-fi story anybody’s read before, and it does a pretty good job. It probably doesn’t do as good a job as a really good human, but it’s definitely something that, if it’s not reasoning, I don’t know what reasoning means.
Emmet: And you have a blog post where you illustrate specific examples.
Fergal: A post I wrote at the weekend because I got frustrated. It’s hard to be sure, right? But so many people, including experts in AI, are totally dismissing it. They’re like, “Oh, this thing doesn’t understand anything. It’s just doing next-token prediction.” That was always the right take in AI for decades. But now the water is muddy, and everyone should acknowledge that rather than saying it definitely doesn’t understand anything.
Emmet: Although we’re having this speculative conversation and throwing ourselves into the mix, it does seem like a particularly bad time to make grand predictions about the limitations of this stuff. I think that the blurry JPEG of the web article was March or something, and I wonder if it’s been disproven already by GPT-4.
Fergal: I think so. And there are lots of different positions here that are critical of it. There’s the blurry JPEG of the web thing, which I thought got disproven very fast. And it’s hard to prove this, but all you can do is construct lots and lots of evidence. Because you can’t… there’s this idea of philosophical zombies or solipsism where I don’t know that you’re a thinking person. For all I know, inside your head is a giant lookup table.
“All you can do is say, ‘Look, this thing is doing such a good job when I ask such weird things that I’m starting to get convinced it’s reasoning.’ For me, GPT-4 is beyond that bar”
I have a subjective sensation of consciousness myself, and you can get into whether that’s real, but either way, I don’t feel I’m a big lookup table, but I don’t know about the rest of you. It’s very difficult to prove that. You can ask somebody to prove they’re not a lookup table. And all you end up doing is testing them in this behavioral way – the same way we can test GPT-4.
Alan Turing and his Turing test paper honed in on this and on the idea that a behavioral test is sort of the best you can do. And when you do a behavioral test on these models, they seem to do a good job at what I would call reasoning even totally out-of-sample. You can never be sure with a behavioral test because a lookup table that’s big enough, with all possible things you could ask and all possible answers, would fool you. All you can do is say, “Look, this thing is doing such a good job when I ask such weird things that I’m starting to get convinced it’s reasoning. For me, GPT-4 is beyond that bar. Maybe, in the future, someone will have a theory of intelligence, and they’ll be able to inspect the weights of the network and say, “Oh, this is where the reasoning module is. We’re not there yet.”
Emmet: It seems like we’ve rushed past the Turing test. I think people would say, and correct me if I’m wrong, that the Turing test has probably been passed, and certainly in the last six months. Would you agree with that, or am I factually incorrect there?
Fergal: Well, I don’t know. I happened to quickly read the imitation game paper again recently, and actually, in the test, he talks about an average interrogator spending five minutes. And with that formulation, I’d say it’s probably close to being passed.
Emmet: I would’ve assumed it passed with flying colors at this stage, no?
“When I look at Turing’s original paper, it feels like it’s been passed in the spirit of that original formulation”
Fergal: I don’t know. If you sat me down in front of GPT-4 and a human, I would be able to learn tricks to push it into areas that it’s weak at and then be able to detect signals of it there. And I could probably get good at telling it apart. I expect most people who are going to spend time with it could probably evolve strategies.
Emmet: I think you have to have an eye. You work with it every day. Let’s say, for example, with Midjourney V5, we got to this stage where, for the vast majority of people, the tells are no longer actually there. They fixed the fingers, the blurring, the weird shapes in the back. If you know what to look for, you can still spot a bit of feathering where the hair should be. But I think you need to be quite forensic at this stage.
Fergal: I say we’re kind of there with GPT-4. To a five-minute inspection from an average person plucked off the street, I think it’s probably passed it. When I look at Turing’s original paper, it feels like it’s been passed in the spirit of that original formulation.
Emmet: Probably not for voice synthesis, at this stage. And certainly not things like music or movies. It’s just interesting to see how this stuff progresses at different speeds. Is it because of the training models, or do you think different media has fundamental limitations?
Fergal: I’d say it’s probably due to training models. I don’t feel like there’s a fundamental reason why it won’t be able to do really good video synthesis in time.
Emmet: Although the barrier to fooling a human is probably much higher with something like video, just in how attuned we are biologically to movement and things like that. It’s a lot easier to spot a fake.
Fergal: Lions in the bush coming towards you.
Emmet: Thousands of years’ worth of psychology intended to get us to run when we’re supposed to.
Navigating the S-curve
Emmet: People often talk about the S-curve of technology. There’s a slow, but then rapid take-off or maturation of the technology, and then it tapers out. Phones were amazingly awesome, year-on-year improvements for a few years, but this year’s phone is kind of the same as last year’s because the S-curve has tapered off. Where in the S-curve are we with this technology? What should you look for to have a sense of where we’re at?
Fergal: Yeah, it’s impossible to know for sure, and we have to be okay with that. We know there’s going to be a ton of money and resources that are going to flow into this space. Large language models, whether they’re on the path to superintelligence or not, whether that’s even achievable or not, are industrially useful in their current form, and there are likely many more generations that will be industrially useful without touching on dangerous stuff. We should go and turn those into products that make humans more efficient, remove drudgery, and help us get a lot more done. And I think we’re seeing that.
“There are all these complex, overlapping feedback loops, so I’d be really surprised if it stops anytime soon. I think it’s going to accelerate”
Where are we on that? Well, it feels likely that people are going to train more models that are bigger and better than GPT-4. Because so much money is going to flow into this space, it feels quite likely that people are going to get better at making smaller and more efficient models that do really impressive things. And it’s going to be way easier to productize and build cool products on all this tech. I’ve extremely high confidence that’s coming over the next few years. Beyond that, do we hit diminishing returns? That’s possible, but I would say that the S-curve we get is this complicated function of a whole bunch of different stuff.
We’re going to end up making a lot more GPUs, and video’s going to make a lot more, right? And then, it’s going to get better at it, and they’re going to get cheaper as they scale it out. And there are also going to be tons of research students figuring out better algorithms to train large neural networks. That’s going to get better. People are going to use powerful models to train smaller, faster ones. There are all these complex, overlapping feedback loops, so I’d be really surprised if it stops anytime soon. I think it’s going to accelerate.
Weighed against that is that some things get harder over time. To find more antibiotics, you find the easy-to-find ones first, and over time, it gets harder and harder to find new ones. It’s possible that we get the easy gains first, and then you hit scaling laws, and so on. Open AI has said they don’t think the path to more intelligence is to train bigger and bigger models, but I’m skeptical. Maybe we’ll hit a limit here, but I bet we’ll get more intelligence with bigger models.
“I think it’s going to be bigger than the internet. Maybe as big as the industrial revolution if it goes far enough”
Emmet: On top of all the variables that you just described, the thing that strikes me that’s different this time around is the speed and scale. This is totally different in terms of how quickly it’s going to get integrated into our products and lives. Bill Gates had this post recently where he said it’s the biggest deal in technology since the microprocessor in the ’70s. And it makes you think. When he saw that microprocessor, it was him and a hundred guys at the Homebrew Computer Club in some meetup or something, and they got access to it, played with it, and gradually rolled it out. One of the things I thought was dizzying this time was, I guess, in March, when OpenAI started releasing APIs, and people started to hack on top of it.
Fergal: March for GPT-4 and stuff?
Emmet: Right, exactly. Millions of people got to hack on this immediately, and I think it’s going to be a very different dynamic. The amount of creativity that can be applied to the raw technology is orders of magnitude bigger than we’ve ever had before, and it’s just going to add to the complete lack of predictability here.
Fergal: I think this is a huge technology revolution. I said this back in my first podcast with Des after ChatGPT came out, and I think it’s going to be bigger than the internet. Maybe as big as the industrial revolution if it goes far enough. But this is the first one of this magnitude we’ve had in a while. When the internet came, you had this long, slow deployment, you had to run fiber around the world, you had to figure out how do you get this to last mile to everybody. Now-
Emmet: The infrastructure for delivery is there.
Fergal: And so, what needs to happen at scale is GPUs. We probably need to build a lot of GPUs to be able to run inference at scale. We need to build products, and the products need to be adaptive. But the product development loop can be pretty fast, and the rest of it seems to be bottlenecked on scaling GPUs and [inaudible 00:43:46] economics. And I think the [inaudible 00:43:48] economics are going to get really good, really fast. Even GPT-3.5 Turbo is not expensive.
Emmet: Does the software get cheap fast enough for there to be no bottleneck around GPUs?
“There have been lots of products in the past that were bottlenecked on hardware costs, and then that bottleneck went away. I expect we’re going to see something like that here”
Fergal: Not at the moment. GPT-4 is a very expensive model and is absolutely bottlenecked on GPUs. But surely that will change. I’ve no private information here, but I suspect that GPT-3.5 Turbo is a distilled version of davinci-003 or something like that. It’s cheaper to run. I bet it’s cheaper on the backend too. Who knows, maybe they’ll produce a distilled-down version of GPT-4 that is 10 times faster. That could happen anytime, for all I know.
Emmet: For the time being, though, the cost aspect is also a thing for product people to consider. There are some fundamental limitations based on the costs of providing this tech that I think a lot of businesses are also looking at it and going, “What’s our model? What’s our customer acquisition cost? How do we monetize usage of our product?” because there is probably a set of products out there where the use cases are ideally suited but the business model around the product is not. So there are a lot of interesting product challenges.
Fergal: Totally. And this was the case in the past. Once upon a time, Hotmail gave you a limit to the number of megabytes of email storage you would have. When Gmail came along, it was effectively unlimited because storage got cheap in the interim. There have been lots of products in the past that were bottlenecked on hardware costs, and then that bottleneck went away. I expect we’re going to see something like that here. We’re in the early days here. But a lot of the time, they’re cheap compared to a human doing the same type of task. And so it’s like, “Is it valuable enough? Is it something you wouldn’t have a human do? Is it valuable enough to have a machine do it?” And for a lot of stuff, the answer is yes. I think we’re going to see really fast adoption here.
Emmet: You talked about Gmail and the email limit, and famously, it was launched on April Fool’s Day, and was it an April Fool’s joke that they were giving you a gigabyte of storage. All of these new technical capabilities unlocked new interface possibilities. Now that you have a gigabyte, you don’t have to archive or folder, you can just search, and everything can go in threads, so it changes the nature of the product that’s possible.
AI is going to open up a whole bunch of new products. In the early days, we’ll probably see a bunch of products retrofitting themselves, and we did this as well. “What’s the easiest opportunity? We’ve got this often-used text box in our product. Let’s add the ability to summarize, rephrase, shorten,” blah, blah, blah. We added that, and our customers loved it because it’s a great use case when you’re talking to your customers. Every text box on the internet that needs one will probably have one soon.
“I personally feel like user interfaces are likely to go away. Designers won’t design user interfaces – AI agents will design user interfaces”
What are the next-level things? From an interface point of view, what will be possible? You’re talking about a lot of money flooding in that’s going to enable new types of products. We’ve been talking about conversational commerce, and at Intercom, we have spent a lot of time thinking about bots. Aside from the raw technical capabilities, it’ll open up a whole offshoot of the evolution of software because you can build very different types of software with this now.
Fergal: I think that change could come quite fast. As a thought experiment, if you had an intelligent human whom you work with a lot who knows you and your preferences, and you were interfacing with them, they were driving the computer, and you were telling them what to do, what would that look like? A lot of the commands you would give would be verbal. Sometimes, you might reach down and say, “Oh, let me just take over the mouse from you,” but a lot of what you’d give would be high-level and verbal. But then you’d look at the screen to see the output. If someone has a bar chart with a bunch of data, you don’t want to describe that verbally – you want to see that visually.
I think we’re going to end up in a future where a lot of the input to the computer is verbal, and a lot of the output is going to be customized on the fly. It will probably be text because it’s really fast, but I personally feel like user interfaces are likely to go away. Designers won’t design user interfaces – AI agents will design user interfaces. If the agent feels you need to see a bar chart to make sense of the data, it’ll render a bar chart. Otherwise, it’ll render stuff in a very ad-hoc way. You basically get an interface customized to the task you want and what you’re familiar with rather than something designed by someone.
You will probably end up with an agent that navigates the software for you, and that’s going to be better than navigating the software for 99% of the use cases.
Emmet: That’s very plausible. We imagine that everything will become text-first now, and in fact, it means, “You’ll have everything you have today plus a whole other set of things that are now text-first as well.” I think it’ll be largely additive rather than upending things.
Fergal: I don’t agree. I think there’s going to be an upending moment here. I think every complex piece of software is going to have some sort of freeform texting where you describe your task, but I think it’ll change. You will probably end up with an agent that navigates the software for you, and that’s going to be better than navigating the software for 99% of the use cases.
Emmet: That’s super different from the LLMs we’re used to working with today in an important way. Today you talk to them, they give you text back, and that’s it, but you’re describing a world that maybe we’re just starting to creep into with ChatGPT plug-ins where they’re starting to act on your behalf.
Fergal: I think it’s wrong to say you put text into them, and they give you text back. The really scrappy interface to ChatGPT and GPT-4 looks like that due to an accident of history. And on a technological level, they do, in fact, do text completion, but that’s going to disappear pretty fast. That’s not how we use Fin. In Fin, the LLM is a building block deep down. You talk to a bot, sometimes you click buttons together to do stuff, and you’re going to see that again and again.
Initially, the fastest way to integrate LMMs is text input/text output, but they’re just going to become a building block. Medium-term, LLMs are an intelligent building block that people learn to use to get software to do intelligent things. Long-term, you’re probably going to end up with an intelligent agent; your browser is probably going to turn into an intelligent agent.
Emmet: And the agent is clicking on coordinates on the screen for you.
Fergal: Probably initially, for backward compatibility. But then, I think, you just build APIs. Why would you build websites?
Emmet: That’s what the logical part of my brain thinks, but most of the software we build today is built using HTML, which was not designed. It’s also an accident of history that we’re building software applications using a markup language with loads of other stuff sprinkled on top. Maybe we’ll just end up building what we have.
Fergal: I’m sure it’ll be there as some compatibility or some intermediate layer.
Emmet: Or a fallback or something like that. What we’re talking about there, to be clear, is looking at a picture of what’s on your screen, finding the text that says, “click here,” and simulating moving your mouse to actually click on the “click here” for you? Is that what you’re talking about when you mean an agent acting in the browser?
“We won’t really care what it’s like down underneath the hood. We just know we can ask for what we want, and it’ll complete the task”
Fergal: No. And again, this is speculative, but imagine there’s a legacy government website you want to get something done on. For example, you need to update your bank account details. What you do is say to your agent on your phone or desktop or browser, “Hey, I need to update my bank account on the government’s social security website.” Your agent goes, “Okay, done.” In the background, your little intelligence agent went and drove the website; it didn’t show that to you. After a certain point, people working in the government are going to be like, “Well, why do we need to keep building the websites? We just need to build the API.”
Emmet: Right. LLMs are a pretty awesome API to an API, in a sense. You can layer it on top, and it’s just a more human-readable API to any machine-readable API.
Fergal: Yeah, exactly, but I’d phrase it differently. The intelligence we happen to have comes in the form of LLMs at the moment, but that’s going to get abstracted away. We won’t really care what it’s like down underneath the hood. We just know we can ask for what we want, and it’ll complete the task. If you say to it, “What was the unemployment rate in Ireland over the last 10 years for people in their 20s?” It’ll go to the Central Statistics Office website, download the data, parse it, render a graph, and so on.
I have a talk coming up, and I needed a graph. I spent time on Google trying to find the exact one I had in my head, writing my search query in Google, and after two minutes, I just couldn’t find the right graph. So, I went to GPT and said, “Generate me the following graph.” It generated the plug-in code, and I just put it into my notebook. I copied and pasted my graph and put it in my presentation. The fastest way for me to get the graph I wanted was to have an intelligence system generate the code. That was faster than trying to find it on Google. There’s a lot of interface friction, but that’s going to go away, and you’re going to end up with a really fast agent that accomplishes tasks. Once you have that, it’s going to eat your current software stack.
Emmet: I’m understanding what you’re saying a little bit better, but I don’t see all software being reduced to a text input box because that’s the wrong input and output modality for a lot of stuff, including what you just described. A good example is all the image generation stuff, which is loads of fun to play with, but you’ve got to go onto a Discord bot to engage with Midjourney and hack it by writing F stop 1.4, hyper-realistic… No, this is fundamentally a visual thing I’m trying to create. I want a more tactile UI. I want more knobs and dials. What are the properties of it that I can dial up and down and play with rather than feeling my way blind in effectively a command line interface? Because the lack of affordances in a command line interface means it’s often not the best UI.
Fergal: But in the future, there would probably be something you say to your agent like, “Hey, I want to edit those photos I took yesterday.” And it knows you and your level of sophistication. It knows that when you want to edit your photos, you’re looking for four filters and a crop tool, or alternatively, it knows that you want to do super pro-zoomer stuff. It goes and looks in its pattern library for the best interfaces for each of those and renders that interface for you.
“It’ll entirely depend on the task you’re doing. If you’re a pilot, you’re not going to want to go, ‘Time to land the plane! Hey, LLM, auto-assemble an interface for me to do it'”
Emmet: And then you’re saying, “Actually, I want it a bit more professional.” And it goes, “Okay, I’ll give you the pro version of the UI.” And it dynamically renders that.
Fergal: Look, there’ll be some tasks you do where you don’t want to learn to use the interface. Des was talking about this recently in a different podcast. You need to update your vacation time in Workday, and you don’t want to learn an interface to that. You just want the task complete. There’ll be other things where, for example, you’re a professional programmer, and I need to learn to use an IDE. Some designer has thought in great detail about what I’m going to want and need to do, and there’s probably some light layer of customization there, but there’s still a well-designed interface I am going to learn to use. I think that interfaces for the former, for tasks I want to do, are going to disappear, or a lot of them are going to be rendered on an ad hoc basis. For the latter, yeah, they’ll be adaptive.
Emmet: I agree with all of what you said. It also occurs to me additional nuance. It’ll entirely depend on the task you’re doing. If you’re a pilot, you’re not going to want to go, “Time to land the plane! Hey, LLM, auto-assemble an interface for me to do it.” There’s going to be regulation and things like that, I’m sure. But that does reflect one of the big differences, which is going from working with computers, which we’ve always thought of them as this highly deterministic, binary, on/off switch-driven truth machines, and now, suddenly, the nature of that is shifting a lot. And that’s a big change as well as all the stuff that we’re describing – what you can expect, how you can expect it to work for you personally, and the amount of fungibility or control you have over it. I think we will start to see lots more exciting experimentation divergence, and the level of customization we have today, where you can change your wallpaper or whatever the font size, will probably pale in comparison.
Towards the center of the circle
Emmet: You also said something interesting I wanted to come back to. Imagine designers who are mostly assembling from a library. The task of user interface design is interesting because we’ve been setting ourselves up for that with design systems. A design system is a pattern library of components. If you’re building a big product, you want it to be consistent, and you want to be able to put it together fast. So a lot of the groundwork we’ve been laying and the systems we’ve been building, even on, let’s say, design teams, and probably engineering teams as well, building components that can quickly be reused by these systems, is all pointed towards our ability to build these tools fairly quickly. What you were describing is something that takes your design system and builds a UI from it, and it doesn’t seem miles away.
Fergal: Or maybe it takes the standard open-source design system and builds a tool from it. I don’t know if this will happen at the level of individual companies or if it’ll happen at a broad horizontal level.
Emmet: Yeah, that would be so boring. It would be tragic. Before iOS seven, we had skeuomorphism and everything, then they went super opinionated flat design, and the entire industry was so influenced by Apple’s dominance that all the websites started to look the same. Apple released their human interface guidelines and said, “Look, iPhone apps should look like this now.” But it led to a flattening of diversity and a more boring web, in my opinion. And that was in service of these systems that can build themselves.
Fergal: You’d be able to tell your agent that you want it to look funky and retro. You’ve got to imagine that’ll come, and I think things will get way more customizable in terms of what people actually use because you have an intelligent layer that understands how to construct an interface with a given team. You’d probably do that today. If you set off today to build Midjourney for user interfaces, you could probably do it. We’ve got GPT-4 that can generate code or CSS to write user interfaces, and we’ve got the image synthesis models where you embed all the images and the text, and you sort of squish them together. I bet you could build something pretty fast.
Emmet: It’s so funny because you’re saying this, and my emotional reaction is like, “No, you don’t get it; you have to think about usability and understanding humans and all this kind of stuff.” And then I’m like, “Yeah, they’re the reasoning capabilities we talked about, and it seems like it has them now.” And so as we’re talking about it, I’m having that emotional…
Emmet: The AI is coming for your discipline. But I’m honestly not that worried about it because I think a lot of designers, and I’ve heard this said for programmers as well, are not going to mourn the grunt work that this largely makes faster and improves. It actually allows them to maybe go up a zoom level and think a bit more about the solution rather than the execution of the solution. Building products is still super laborious and super time-consuming, and I think it’ll be great to see what happens if we take some of the grunt work out of that.
Fergal: I mean, it’s this whole debate around jobs and job placements and job change, and something’s going to happen here. When I hear that, I’m like, “Oh, maybe that means you don’t need designers anymore – maybe you just need product managers.” And a product manager can now do everything a designer used to do. Maybe you don’t need a programmer – maybe you just need a product manager. And we all turn into product managers in the future. I don’t know. Maybe there could be a lot more roles and jobs like that, or maybe it’ll be fewer.
Emmet: I think we should lean into that. One thing I noticed in my career is that the more senior you become, the less specific to your discipline you are. You have to become more of a general leader.
Fergal: I had this conversation with someone on the design team. When you’re junior in a discipline like engineering or product or design, you’re at the edge of a circle. And then, as you get more senior, you get more and more towards the center. In the center of the circle is the product. And so, as you get more and more senior, your world becomes more and more about the product you’re building and less and less about the angle you’ve come from.
Emmet: I can see that too. So, we’re all going to become PMs, is that the plan?
Fergal: Yeah, I mean, ultimately, that’s what we’re trying to do in a job like this.
Emmet: I mean, what is a PM if not a product person without any directly applicable practical skills, am I right, Fergal?
Fergal Reid: Yeah, I know. What’s a PM?
Emmet: I think we should wrap up. Cheers, Fergal.
Fergal: Thanks, Emmet.