The GPT effect: A new era of customer service

Audio Content Producer, Intercom

March 9, 2023

Recent breakthroughs with large language models have exceeded all of our expectations. We’ve brought together industry experts for a conversation on GPT and how it will shape the future of customer service.

We can’t say it caught us by surprise. For years, the industry has lauded the potential for AI and machine learning to radically transform the way we work, especially as advancements in computing power and data storage made it possible to train larger and larger models. But we weren’t quite expecting just how quickly the recent advancements in OpenAI’s ChatGPT would unlock new possibilites.

At Intercom, we’ve always shipped to learn. Only by shipping new features quickly can we get proper feedback, learn from it, and iterate again and again to better serve our customers. And so, naturally, that’s what we did with this new tech. Over the past couple of months, we shipped a few AI-powered features to 160 customers. And while it’s still too early to tell how these large language models (LLMs) will play out in our lives, we believe we have reached a crucial inflection point – especially when it comes to customer service.

And so, last week, we hosted a webinar to dig a little deeper into the business use cases of GPT. Is this wave of innovation any different from past waves? Will it transform the way we work and the way businesses interact with customers and prospects? Can it spark a new generation of startups? To give you a little more insight, we’ve invited a couple of bigwigs in the startup scene to weigh in.

In today’s episode, you’ll hear from:

Ethan Kurzweil, Intercom board member and Partner at Bessemer Venture Partners
Fergal Reid, our own Director of Machine Learning
Krystal Hu, VC and Startups Reporter at Reuters
Talia Goldberg, Partner at Bessemer Venture Partners

They’ll talk about large language models like ChatGPT, how businesses are incorporating this technology, and how it will shape the future of the customer service industry.

Short on time? Here are a few key takeaways:

We’re starting to see the sticky use cases of large language models – there’s great potential for augmenting customer service due to its regularity and use of natural language.
For now, large language models are expected to augment human capabilities rather than replace them, as they can help make professionals more productive and efficient in their work.
While it’s still too early to measure the success of Intercom’s beta experiment, adoption and usage of the latest AI-powered features have been huge and early feedback is very promising.
Large language models can get very expensive very quickly. Still, over time, they’ll become cheaper and more ubiquitous, allowing for more experimentation and discovery.
While there are still issues with hallucinations, you can configure and constrain these model to make it more trustworthy when the situation requires a higher degree of confidence.
Models aren’t one-size-fits-all. It’s likely that, in the future, companies will run a bespoke mix of different, customizable models that suit different business problems.

If you enjoy our discussion, check out more episodes of our podcast. You can follow on Apple Podcasts, Spotify, YouTube or grab the RSS feed in your player of choice. What follows is a lightly edited transcript of the episode.

The rise of ChatGPT

Krystal Hu: Thank you so much for everyone taking the time to join. I am Krystal Hu, and I cover venture and startups for Reuters. As many of you guys know, AI and the wave of ChatGPT have burst into the scene in the past few months, and a big part of my job is to figure out the technology and how it’s changing different aspects of life. For today’s topic, we’ll be focusing on how ChatGPT will shape the future of customer service. We’ll discuss what exactly ChatGPT and large language models are, how this technology will be used, the impact it will have on existing and future technologies, how startups are incorporating this technology, and how new companies are being built.

We have a great panel today with us. Two amazing investors from Bessemer: Talia Goldberg and Ethan Kurzweil. Talia is based in San Francisco, invests across consumer internet and software businesses, and works with companies like ServiceTitan and Discord. Ethan Kurzweil is also based in San Francisco, and he leads investors in a variety of verticals, including developer platforms, new data infra, digital consumer applications, and crypto.

And then, we’re going to have the Director of Machine Learning at Intercom, Fergal Reid, giving us an inside look about how Intercom is incorporating this technology in its latest offerings – including a few AI assistant features. I look forward to picking their brains and hearing what they’re seeing on both the startup and venture front and the changes GPT might bring. Throughout the process, if you have any questions, feel free to pop your question in the chat, and then we will have about 15 to 20 minutes at the end of the conversation to go through the questions.

I guess I’ll start with you, Fergal, because you are the technologist in the room, and you’re on the front line of incorporating GPT into Intercom’s offerings. Maybe you can start by giving us a bit of background and explaining what is GPT and ChatGPT, and how did it come about for you to incorporate this technology?

“I’m not going to code rules, and I’m not going to specifically say, ‘Learn to predict X versus Y’”

Fergal Reid: It’s a very exciting time in technology. I’m going to presume a lot of people have probably seen ChatGPT at this point because it’s just made such a big wave. But from the technology perspective, from my narrow view of the world, I’ve been at Intercom for about five years, and I run the machine learning team. And the machine learning things we’ve done are using algorithms that have been around a while – using supervised learning algorithms, algorithms that learn to tell apart things. You can be like, “Hey, let’s predict whether someone’s going to ask for one thing or another.” With these machine learning systems, you give them a lot of training data: “Hey, this is an example if someone asked you one question, and this is an example if someone asked you another question.”

And what’s new and different with this latest wave of generative AI is that instead of just teaching a model to predict one thing or another, you’re saying, “Hey, model. Learn how to generate new data of this type. Learn how to generate an image.” You give it some text and it learns to generate an image that maps to that text, or, with ChatGPT, you just talk to it and give it some text, and it gets pretty good at generating more text in response to that.

“We’ve got this really big model, we ask it questions in English, tell it to do things in English, and it’s pretty good at just doing what we tell it to”

It’s just a different way of doing machine learning. I’m not going to code rules, and I’m not going to specifically say, “Learn to predict X versus Y.” Instead, I’m going to take a really large amount of training data, make a model that’s very good at trying to predict that training data, and then, hopefully, I can get it to do useful things by generating new examples.

With ChatGPT, you ask it something by giving it some text and saying, “Generate what comes next.” And surprisingly, that’s pretty useful. You can say, “Hey, here’s a customer support conversation, and this is the summary of the support conversation,” and then give it to ChatGPT, and it’ll generate what happens next or what it would expect to see next. And perhaps, you say, “This is the summary,” and then a summary pops out. And that’s very useful. It’s a very general way of building features and systems. Instead of coding a new machine learning system for every little thing, we’ve got this really big model, we ask it questions in English, tell it to do things in English, and it’s pretty good at just doing what we tell it to. And so, at Intercom, we’ve been trying to use that to build product features.

A game changer for customer service

Krystal Hu: I want to bring Talia and Ethan to the stage as prolific investors in the space. You have seen a couple of technological waves. How is this one about generative AI different, and what are the areas of applications you’re excited about?

Talia Goldberg: Sure, thanks for having me. That was a great overview of what generative AI is. It’s funny, just before this meeting, I was looking at a piece we published on our blog last summer, maybe eight or nine months ago, and this was a few months before ChatGPT even launched, but we were starting to see a lot of momentum and reason to be excited about what was happening with large language models in particular, and the potential of AI and generative AI as this new really powerful wave of artificial intelligence.

And we had a prediction: “Today, less than 1% of online content is generated using AI, and within the next 10 years, we predict that at least 50% will be generated by or augmented by AI.” And we were debating that, and we thought that it was a wild thing to say, but holy shit, we underestimated how quickly AI can transform a lot of the information we see. I’d say it could be 50% within the next two years of our online interactions, content, and media. The implications of that, I think, are vast across a lot of information and knowledge work, including customer support.

“You see the sticky use cases right away where the technology is ripe for disrupting, improving, augmenting, and making better, and customer support is straight down the fairway for that”

Krystal Hu: Ethan, you have been working with Intercom for a while. Is this the moment you think customer service has been waiting for? Because I feel like the technology and opportunity are golden for customer service applications like Intercom.

Ethan Kurzweil: Yeah, I feel like this is maybe the bleeding edge application of large language models and what they can do. If you step way back and think about technology changes and platform shifts like the smartphone moment, the iPhone moment, and things like that, what happens early on is that there’s all this excitement and lots of developers and creators rush into a space, and then you have this washout where you see which are the bleeding edge applications where it sticks first, and the ones where it doesn’t get you into a little bit of a trough of disillusionment. I think we’re probably still a little early on that curve, but you see the sticky use cases right away where the technology is ripe for disrupting, improving, augmenting, and making better, and customer support is straight down the fairway for that.

I’ve worked with Intercom now for almost eight and a half years, and Intercom’s been a team that’s always been at the forefront of adopting new technologies when they’re ready. And I remember that two or three years ago, people said, “Automation, automation, automation.” And the product leadership at Intercom always said, “It’s not good enough yet. We can do it, we can stick it in in such a way that we could check a box on some feature request form, but it won’t lead to a really good human-like flow.” Intercom’s always been founded around this idea of making internet business personal. And if you have a bot that doesn’t sound personal, that’s orthogonal to that.

The fact that Intercom’s using it so successfully in their flow shows you that the technology is ready and that this is one of the many, many things we’re going to see it impact. Not everything all at once right away, but over time, we’ll see much more impact by giving a machine the ability to converse in a human-like way.

“You look at the curve and rate of improvement, and it’s going to be even better a few months from now, a few quarters from now, and a few years from now”

Talia Goldberg: If I can add one thing, I think customer support is the perfect initial area for AI to start having an impact. And one of the reasons for that is that it uses natural language. You can communicate with the AI using English, and it will respond in English. You don’t need to code – it generates information. And that’s what customer service and support is like – generating great, human-like experiences that can be personalized, resolving complaints, and getting better and better over time. So, you also get this great feedback loop by using it in customer support.

Even though there may be some challenges and things that are rough around the edges today, the technology and the potential are already really great, as Ethan said. You look at the curve and rate of improvement, and it’s going to be even better a few months from now, a few quarters from now, and a few years from now. It’s one of the categories we’re most excited about, and we think every business can take advantage of it and needs to be thinking about it.

Krystal Hu: Fergal, this is the right timing for you to give us an overview of the recent feature launch at Intercom and how you incorporated ChatGPT into it.

Fergal Reid: Absolutely. And just to echo Talia and Ethan’s sentiments here, there’s so much structure in the domain, there are so many things that a customer support agent does where they’re doing the same thing they’ve done the last day again, or that maybe one of their teammates has done before, and there’s so much regularity and structure that it feels really ripe for a system that learns and uses AI to make people faster.

“We felt that the best place to get started was with a human in the loop. Someone’s wrapped in the Inbox and we want to make them faster, but they’re still able to check and approve it”

When ChatGPT launched, at the same time, OpenAI released this new model for developers’ use, text-davinci-003. We’ve had a relationship with OpenAI for a long time, and we felt, when we looked at that model, that it was really crossing a threshold of usefulness and that we could build with it. And so, we did some initial benchmarking. People spend a lot of time in the Inbox, and one thing they have to do a lot is write summaries of the conversation they just looked at before handing it over. This technology seemed to be really great at doing conversational summarization, and we were like, “Can we build a feature that does this and get it out to our beta customers?” Intercom has this principle of “ship to learn.” We believe in shipping new features extremely fast to customers, so we can learn whether it’s solved a problem or it’s more of a curiosity.

And so, basically, in early December, we started a project to see if we could quickly ship some features that would work with customer support reps in the actual Inbox to make them faster. One was summarization, with other features around helping them compose text faster. And we really felt it was the right place to start with this technology because generative AI does have a downside. It’s not always as accurate as you might think. It’s easy to look at ChatGPT, ask it a question, it gives you a response, and you think, “This is amazing.” And then you read it with a little bit more detail, and actually, sometimes, it gets things wrong. We felt that the best place to get started was with a human in the loop. Someone’s wrapped in the Inbox and we want to make them faster, but they’re still able to check and approve it. It was a great starting point.

Now, I see people asking in the comments, “Hey, what about bots and things that can answer questions themselves?” We think that’s coming and it may be coming soon, but we’re still exploring it. The big issue for us is accuracy. We feel it’s ripe right now to have a human in the loop where it makes the support rep faster. And probably, coming soon, are things that go down that next step. That’s a very interesting area.

Ethan Kurzweil: To riff on that, we’re getting some interesting forward-looking questions like, “Will this make my days numbered as a copywriter?” I don’t think so at all. Where this technology is and is likely to stay for a while is in augmenting human capabilities and human intelligence, making you more productive as a copywriter but not necessarily replacing you because, first of all, the technology’s not there yet, and second of all, the bar for what amazing customer support or any communication with a business is is just going to go up and up as we have these resources. While the tech may be able to handle some copywriter and support response use cases on its own, the bar for what’s going to be really good copy and really good support and so on and so on is going to rise as we have access to these technologies. The ideal state is that you’re going to have access to these technologies to be more productive, but it’s not going to replace you anytime soon.

Talia Goldberg: Yeah. I love how Wyatt just said it’s an ability multiplier. We talk a lot internally about the example of Copilot, which is like auto-complete for coding, and it’s already making engineers significantly more efficient. It doesn’t replace engineers or engineering at all, but it can augment it. A very basic example of that might be the calculator. Back in the day, we used to do math by hand. Now we use calculators, but math is still very important – we all need to learn it, and mathematicians are very important in this world. Arguably, your role may become even more important because as the cost to create content goes down and there’s a flood of a lot of different content and information, creating content and information that can stand out and rise above is going to be at an even greater premium over the next few years.

Intercom’s experiment with GPT

Krystal Hu: It’s been a few weeks since Intercom launched its AI-assisted features. What’s the early feedback you have seen? How do you measure the success of incorporating this technology?

“We’re seeing a lot of adoption, a lot of excitement, and a lot of usage”

Fergal Reid: I’ll be very transparent about that – I don’t have a fully satisfying answer to that question yet. What I can tell you is that we’re now live, we have thousands of customers using it regularly – we’ve had a lot of adoption. We will likely try and measure if this actually made people more productive, because let’s say, for our own CS team, we can gather telemetry on, “Are you faster if you use these features?” and put together some form of a controlled experiment for that. We always like to try and get some form of actual data on this at some point, but we’re not at that point yet. We’ll probably have some numbers on that or more of an understanding of it, at least internally, in a month or two, I would guess.

What I can tell you at the moment is we’re seeing a lot of adoption, a lot of excitement, and a lot of usage. There are definitely some features like summarization that customers tell us saves them substantial time. We have had customers tell us things like, “Hey, for some conversations, it can take as long to write the summary for a handover as it takes to resolve the end user’s issue.” And so, we definitely feel good about that.

In some of our other features, you write a shorthand, a little bit like GitHub Copilot. We were inspired by Copilot, and in Copilot, if you’re a programmer, you can write a comment or shorthand, and then it will fill out the code. One of our features is “expand,” where you write a shorthand, and it turns it into a longer support message. Sometimes, that works and saves people time, but we don’t have data on it yet. What we have live at the moment is just a Generation 1 version of that. And we have prototypes of a Generation 2 version. At the moment, you write the shorthand, and the large language model expands that out. What we’re trying to do instead is say, “Hey, let’s pull in the last time you answered a question like that. Let’s pull in macros that are relevant to this.” And we have some internal prototypes that are working pretty well. We’re still innovating and doing things that are going to really move the needle, but we don’t have metrics yet. Soon.

“I have a chart in Tableau of our daily spending with OpenAI that we keep a nervous watch on”

Krystal Hu: To follow up on that, how do you measure the cost of it? As I understand, you probably send inquiries to OpenAI and they charge, I guess, two cents per thousand characters or something like that. And I guess, as your adoption rises, that bill also piles up. Do you have any learnings or observations to share about incorporating this technology?

Fergal Reid: I have a chart in Tableau of our daily spending with OpenAI that we keep a nervous watch on. It’s definitely a consideration. I mentioned the summarization feature, and we’ve built it in a very human-in-the-loop way where you’ve got to ask for the summary before you hand over the question. And one thing our customers say to us is, “Hey, why do I have to ask for this summary? Please just maintain a summary at all times in the sidebar so I never have to ask for it.” And that would get really expensive because if we had to pay two cents every time someone said something new in the conversation and the summary changed, that would get extremely expensive. We absolutely have to take the cost into consideration in a way we don’t with more traditional machine learning models.

That said, OpenAI just announced their ChatGPT API, and I think it surprised a lot of people because it was 10 times cheaper than the previous similar models in that series. It’s possible that the cost drops pretty fast and these features become widely adopted. What about other startups or companies building in this area? The advice that we would give at Intercom is to try and get in the market fast because there’s real value here for your customers that you can build and unlock. And the cost will probably come down either because the models will get cheaper as vendors like OpenAI figure out how to make them more efficient or because you’ll figure out more efficient ways to use them. You’ll figure out ways of saying, “Hey, I can use a cheaper generative model for the first part of the conversation, and then, when I have this much harder task that requires more accuracy, I’ll use the more expensive one.” Ethan and Talia probably have a much broader view of that than I do, and I’d love to hear their thoughts.

“You’re never sure what developers are going to do with a new technology until they have it – and have it where they’re not paying two cents every time they make an API call”

Ethan Kurzweil: Well, it’s a good example of what you sometimes see with these bleeding-edge technologies. In the beginning, the high-value use cases get them, and you’re describing the actualization of that principle. At Intercom, that’s the summary feature when requested today. But over time, the technology will be much more ubiquitous and cheaper. And that’s when it can proliferate into more use cases where the marginal cost of doing it is prohibitive today, and that allows developers to discover other applications of large language models in this type of AI where we’re not really predicting.

At Bessemer, Talia and I try to come up with roadmaps of where we think technology will go, but as a developer-oriented investor, one of the key primitives I always think about is you’re never sure what developers are going to do with a new technology, a new platform, a new access to something until they have it – and have it where they’re not paying two cents every time they make an API call – and can riff and do things that sound absurd at first.

I’m excited about the technology getting to the point where there’s just a ton of experimentation. I’m sure that in Intercom’s product roadmap, not today, but a year from now, there’ll be some things that we didn’t predict but have a really high value for customers. And there’ll be some startups that just came out because they riffed on some particular way you can use generative text, and it created a really great user experience for somebody.

Talia Goldberg: There’s a fun example that I think can emphasize some of the human-like potential to augment experiences that‘s relevant to support. If I’m talking to, let’s say, some of the Intercom team with strong Irish accents, and they probably think I have a crazy Western accent, it’s hard for us, at times, to understand each other when we’re super excited and talking really fast. It sounds like a different language even though everyone’s speaking English. AI can, in real time, change the accents of a person a bit to make it more understandable in both ways. So, if I have an Irish accent or a British accent, it will translate that into a California accent, and that can really improve the experience in some ways by lowering the barriers of communication.

Ethan Kurzweil: It’s a good example because technology is getting in the middle of direct communication but making it more human-like, which sounds like an oxymoron, but if deployed well, it could make you feel more connected in a messaging or communication context.

Talia Goldberg: This is the promise of the internet – bringing us all together and breaking down barriers. I really am a big believer in the potential to supercharge that.

The confidence quotient

Krystal Hu: I think a lot of people are having questions about how you make sure everything will be correct in terms of the information flow and that it will be accurate. The stake is different in different use cases, but, in general, you don’t want to provide wrong information to your customers. How do you ensure that?

“It’s not that you, as a human, can never see those things because that would be impossible – it’s that you’re able to filter appropriately. That’s how I think about large language models”

Talia Goldberg: Maybe just one comment, and then I think I’ll let Fergal answer more specifically about Intercom. The models are trained on enormous amounts of data – many billions and billions of points of data and information. And so, no matter how much you try and trick the data or put in false data, it’s still such a tiny, tiny portion of the overall data. That’s one thing to keep in mind as you think about how these models are created.

The other thing is the data inputs. I know there’s concern about whether it’s trained on data that’s incorrect, and don’t get me wrong, there are certainly challenges with hallucination and other areas, so there’s a lot to improve. But in your life, it’s not that you go around and don’t see things that might be wrong or biased or even misinformation. You do come across that, but you use your judgment and mind, and there’s a lot of other good data. And so, it’s not that you, as a human, can never see those things because that would be impossible – it’s that you’re able to filter appropriately. That’s how I think about large language models. There will be some instances in which there’s data and information that isn’t what you’d want in the training set, but the language models’ ability to filter it and get to the right answer should be better and better over time.

“That could be one of the parameters: ‘How much confidence do you have in this response?’ If it’s not good enough, don’t give it”

Ethan Kurzweil: There are some interesting questions both on data privacy and accuracy. The other thing to keep in mind on the data accuracy question before we get to the privacy part is that, in the future, and in some large language models, you can actually set an accuracy quotient. It’s kind of like when an AI was programmed to win Jeopardy – it had a confidence interval that it knew the answer to a question with 90% confidence or 60% confidence. And in that context, where you just lose some points with a wrong answer, they set the interval pretty low at 40% or something. If you’re 40% sure or more, what the hell, go and try to answer the question.

There may be some context where you want human-level accuracy, you set it there, and a lot of times, when the AI can’t get to the 99 percentile, it’ll kick over to a human or something like that. There may be some context even in the military, even in highly-regulated industries, where you have more tolerance for an educated AI-assisted guess. And that could be one of the parameters: “How much confidence do you have in this response?” If it’s not good enough, don’t give it.

Fergal Reid: Just to come in on that, Ethan, that’s definitely a strong product belief we have internally in Intercom, which is that it’s quite likely that there will be a variety of tolerances out here. There’ll be some customers with quite a high tolerance for, “Give me the suggestion; it’s okay if the suggestion’s wrong occasionally.” And there’ll be other customers with a very low tolerance. We expect we’ll need to have some degree of configuration around this.

“We’ve got this new technology that can make much better predictions and do things much faster. How do we take that and make it trustworthy enough, or at least allow customers to choose?”

Just to dive into the weeds with some of the things we’re looking at in the future, let’s say you have something that tries to consume an article and answer a question about that content. One example is you constrain it to say, “You’re only allowed to respond with an exact quote from this.” And it can put that quote in context, but the quote’s got to be there. That’s a conservative way of using these new large language models to do a better job at understanding your query and retrieving the information, but constraining what they can actually say. Another example is you take a generative model and allow it to be generative underneath the hood, but it can only interact with an end user through a predefined series of actions or things it can say.

There are a lot of techniques to take the powerful engine and try to make it safer, more trustworthy and constrained. And I think you’re going to see a lot of people working with that technique. We’ve got this new technology that can make much better predictions and do things much faster. How do we take that and make it trustworthy enough, or at least allow customers to choose? I think you’re going to see a lot of movement in this space over the next couple of months.

Mass personalization across industries

Krystal Hu: On that note, Ethan, Talia, besides customer service, are there any other applications you’re seeing in this space that you’re particularly excited about?

Ethan Kurzweil: I can go first. Looking at some consumer applications, gaming is one we’re excited about. If you think about what makes games fun, a lot of times, that’s the refresh rate on new content, and that requires constantly coming up with creative ideas. We’re starting to see people thinking about, “What if every experience for every player can be new?” You couldn’t have a personal copywriter writing that much content for each person, but an AI could do it. And it could get down to a level where each decision you make generates a new experience based on whatever temporal inputs you want to give the system.

“We went from handcrafted goods to mass-produced goods to mass personalization in a way we’ve probably never seen before”

Media applications as well. Earlier in my career, I used to work at the Wall Street Journal, and the parent company of the Wall Street Journal was Dow Jones. They had a sister news department called Dow Jones Newswires, which was about getting financial news mainly to traders and folks that needed to act very quickly on that information as fast as possible through terminals and things like that. I think about what an AI could do to augment news or get news to the end user more quickly. Again, I don’t think it’s replacing journalists at all, I think it’s augmenting the amount of information and the targeting we can provide to folks much more quickly.

I think about entertainment use cases. This promise of personalized television and premium content services has always been out there, but when you get to the long tail of internet content and user-generated content, it tends to be pretty low-quality. Could you have a high-quality, personalized content delivery service? I think AI could impact that equation in the future.

Talia Goldberg: I love the concept of personalization and everyone having their own experience. We went from handcrafted goods to mass-produced goods to mass personalization in a way we’ve probably never seen before. This is a totally new experience for everyone, which is super cool. I’ll share one of the areas that I think is going to be wildly impactful and really promising, which is in life sciences and biotech.

“The AI is seeing something that we, as humans, have never before been able to see”

Applying AI to drug discovery and development using huge amounts of data to look at molecules and protein structures and genomic data can be really transformative. I read this study that I think was in Nature a month ago, and it described how some researchers gave an AI a bunch of images of a human retina, and the AI, with 90% accuracy, said which retina belonged to either a male or a female. That seems very basic – who cares? But what’s really crazy about that is that no researcher, scientist, or AI expert has ever been able to find any sign of a retina correlating to gender of any form. The AI is seeing something that we, as humans, have never before been able to see.

You think about that, and then you apply that to cancer and different cells and otherwise, and the potential is just massive and really exciting. And we’re already seeing a lot of applications. AI’s going to transform a whole bunch of things – health, software, business applications, logistics, consumer… We could make a long list, but there are a ton of reasons to be super optimistic.

Mix and match

Krystal Hu: When I talk to startups, when they’re incorporating this kind of technology into their offerings, one choice they have to make is which model they work with. Do they only work with one type of model, or do they diversify their vendors to work with other companies besides OpenAI? I’m sure, Fergal, you’ve spent some time thinking about that. What was the experience like at Intercom?

Fergal Reid: With new technology, being right in the forefront, our head tends to go first to customer value. We’re happy to use the most expensive or newest model to try and figure out, “Okay, can we really build a transformative experience for a customer with this that is a core part of the workflow and makes something valuable for them?” And once we do that, we’re like, “Okay, now, how can we make it cost-effective?” And we’re probably going to end up with a large mix of different models from, say, OpenAI, and we’ve also looked at other vendors like Anthropic, which are doing some really interesting work in this space too.

“It’s highly likely that everyone’s going to end up running a bespoke mix of many different models. It’s going to get complex pretty fast”

It’s an exploding space with many different people training large language models, and I think you’ll have different large language models that are better and worse and have different trade-offs in terms of cost and latency and performance. Performance won’t be one-size-fits-all. Some models are better at dealing with hallucinations, some are better at generating creative content, and I think we’re already seeing that.

Our focus is to get whatever models we can, try them out, think if we can use these to build transformative value, get it live with our customers, and then figure out how to optimize that. Once we know it’s delivering value, let’s optimize it in terms of price and cost and work. It’s highly likely that everyone’s going to end up running a bespoke mix of many different models. You could have three different models in one customer interaction. So yeah, it’s going to get complex pretty fast.

“There’ll probably be a new set of metrics that everyone coalesces around that measure the effectiveness of your AI and your particular business problem”

Ethan Kurzweil: I think that’s an interesting point that actually ties the question from before: how do you measure the success of this? Because I think lots of companies will try a model or many, and the question will be, “All right, which is best?” And that’s such an oversimplification because you have to figure out what you are trying to achieve. Are you trying to achieve engagement with users? Are you trying to achieve a quick resolution?

I think there’ll probably be a sort of metricization of this where people come to a standard, like the way Google Search created a new industry, AdWords, and the way we measure click-through rates and conversion rates and things like that. There’ll probably be a new set of metrics that everyone coalesces around that measure the effectiveness of your AI and your particular business problem.

Fergal Reid: Yeah, even before these more recent language models, we’ve had bots that process natural language using pretty big neural networks, although not as big. And whenever we would do something like upgrade our bots, we would conduct a large-scale A/B test framed in terms of end-user metrics such as self-serve rate. Then, we would find edge cases for particular customers or domains where it performed less well, really dig into those, and make sure nothing was broken. I think there’s probably a well-understood playbook, like Ethan’s referring to, of metrics for given domains. A lot of the same things will apply to this new type of technology.

Q&A

Krystal Hu: I’d love to get to the Q&A. I think we were able to address some of the questions during our discussions, but there are a bunch about the potential roadmaps from, I assume, Intercom’s customers or companies working with Intercom who want to know what could be the next AI-aided feature that may come out, both in the short-term and long-term, and also how that will affect the pricing strategy.

Fergal Reid: Cool. Do you want to call out one particular question?

Krystal Hu: I think there was a question about your roadmap for the features for the next six months versus 12 to 18 months, and then the other person asked about the pricing strategy.

Fergal Reid: We have some things coming up that unfortunately, I can’t talk about at the moment. I would say that six months is a really long time in this space. I expect you’ll see a lot of movement in this space over the next two or three months. We will continue to sweat and invest in these features in our Inbox to make support reps more efficient. I’ve already talked about how we’ve got a Generation 1 version of features here at the moment, with summarization and expansion features to help edit a text, and we’re definitely working on Generation 2 versions of those features.

We’ve also met two other exciting features in this space that we’re really excited about, but unfortunately, I can’t share any details at the moment – it’s just a little bit too early to announce and launch those things. I can promise you’re going to see a lot of action and excitement from us, and I’m sure from other companies in this space as well, over the next six months.

“Right now, there’s a bit of a context limit for each interaction with an AI in a large language model, but there’s a lot of exciting research pushing the boundaries”

Krystal Hu: Talia and Ethan, do you have any expectations or hopes on how fast the space could move?

Talia Goldberg: Well, it’s moving a lot faster than we even anticipated. The space is moving really quickly in part because there are a whole bunch of technological advances happening at the same time as the hardware that these models are trained on gets better and improves at a Moore’s Law rate, and there are new architectures and ways of scaling on that hardware. We’re getting better and better at creating new experiences and models.

I don’t have an exact prediction of how quickly we’ll see different things, but one of the biggest areas that we’re watching closely and expect to see a lot of advances in over the next six to 12 months is around personalization and being able to create far more personalized experiences. Right now, there’s a bit of a context limit for each interaction with an AI in a large language model, but there’s a lot of exciting research pushing the boundaries of those context windows, coming up with new frameworks to create far more personalized experiences and remember each person, each user, each customer, and tons of data points about that person to create a better experience.

“I would encourage everyone that’s building new products to ride the ups and the downs”

Ethan Kurzweil: I completely agree with Fergal and Talia. We’re going to see predictable and unpredictable applications of this over the next six months. There’ll be some unsuccessful ones too, and then the narrative will quickly shift to, “Oh, that didn’t do everything we thought it was going to do as quickly as we thought.” Right now, we’re into the peak of momentum and euphoria and, dare I say, a little hype in the space. But over time, it’ll become as big a deal as we thought.

I would encourage everyone that’s building new products to ride the ups and the downs. Don’t ride the up as high as it may be feeling like you should right now, but when the narrative changes a little bit, because it will – all new technologies have that “Oh, that wasn’t as impactful as quickly as we thought” moment – I would encourage everyone to keep building through that as well.

Krystal Hu: Yeah, I’ll definitely take that from you as a crypto investor.

Ethan Kurzweil: It’s the same thing. Right now, there’s clearly a trough of disillusionment for crypto. And good builders are figuring out, “Okay, what of this technology is applicable? What makes sense?” And we’ll see those things come to market over the next couple of years.

Krystal Hu: One common question I saw in the Q&A is: how will this impact human jobs like human agent jobs? I’m curious to hear your thoughts on this specific case.

Ethan Kurzweil: Ever since the advent of the computer, there’s been this prediction that it would be a massive replacement for human work at all levels. It does change the nature of work, and this will certainly change in some ways, but it’s not going to be a wholesale replacement for humans in any broad way.

“Guess what? Every year, we need way more engineers. It’s like Jevon’s Paradox. The more that AI is available and the cost goes down, the more demand there is”

Just as Talia alluded to the example of Copilot and people, I’ve read many articles saying this was going to put all developers out of business over the next two to three years, and that’s complete BS. Everyone knows that’s not true. But it may allow for more productivity and for the cycle time on software to speed up. It may allow for different skills to be more important. I think it just makes us more productive. It’s not that it won’t have an impact and won’t shift the nature of work in some ways. I don’t mean to minimize that because that’s very real. But looking at the whole, you’ll see we come out ahead.

Talia Goldberg: At least until we reach the singularity, I’m pretty convinced of the need for more and more engineers. You could have gone back 10 years and been like, “Ah, there are all these cool developer tools now that are coming out and making it way easier to integrate things,” or, “Oh gosh, there are these self-serve products like Zapier that make it easy for non-technical people to connect products and integrate things.” And guess what? Every year, we need way more engineers. There’s a shortage. It’s like Jevon’s Paradox. The more that AI is available and the cost goes down, the more demand there is. And I think, in a lot of areas, that paradigm will hold true. But as Ethan said, the exact skills and the way it looks may shift.

Krystal Hu: Yeah, that makes a lot of sense. I saw some interesting observations and questions about whether you should tell people that they are talking to an AI versus a real person. It’s an interesting question because it presumes we wouldn’t be able to tell.

Ethan Kurzweil: It’s a good existential question. If you’re talking to a person assisted by an AI, who are you talking to, and what disclosure do you have to make in that case? I don’t have any answers to these questions, but they’re great ones for us to debate.

Talia Goldberg: I find that AI can sometimes generate responses that are so detailed and so good that there’s just no way that a human did it anyway. It’s like the reverse of the Turing test.

Krystal Hu: Another question about the safety functionality. I think we also touched on this earlier, but there is a specific question: “How important is vertical integration of safety functionality with the model provider? For example, how important is it to use OpenAI’s moderations API with ChatGPT model output versus mix and match with Jigsaw’s Perspective API?” Fergal, you may have some thoughts or experiences to share on that.

Fergal Reid: Yeah, I’m not familiar with Jigsaw’s Perspective API, so I don’t know that specifically. All the folks at OpenAI and Tropic and whoever else that are training large language models care a lot about making them usable and safe and aligned, and they care a lot about avoiding hallucinations. And they’re going to continue to work in these areas to make it easier for companies like Intercom to deploy those in trustworthy ways. I’m not convinced that we need to vertically integrate that. I don’t know that Intercom needs to be in the business of training its own massive large language models for us to tackle productization and making them trustworthy enough. I think we’re going to see a lot of movement in this space anyway.

This sort of generative AI gives a lot of freedom to the user to try and figure out how to deploy the model. There’s this emerging field of prompt engineering, and my team is doing a lot of this, where they’re editing prompts and trying to figure out, “Okay, how do I ask the model what I want in the right way to get it to give me the result I’m looking for?” That’s going to get better, at least for a while, that’s going to get more powerful, and the models are going to get easier to control.

I think we’re going to be able to see companies in Intercom’s position generate a lot of value and figure out a lot of applications and design. We’re still learning how to design products around this new technology. There are so many degrees of freedom for people in our position to use that.

“There’s always this tension: do you just piggyback on the general thing? How much better does the general model get versus fine-tuning?”

Krystal Hu: There were also questions about Intercom building its own model. As you mentioned earlier, maybe there will be opportunities to do a mix of which model works better for your use cases while making an API or something like that?

Fergal Reid: Yeah, with the scale that these models are trained on at the moment, it doesn’t seem to make economic sense for every company that’s Intercom’s size to be training their own. But again, there’s a spectrum here. We will develop expertise in designing around them and knowing what to ask the model for. And we’re probably going to see emerging functionality around companies like Intercom fine-tuning models. A lot of these new models are trained with reinforcement learning with human feedback. The cost of doing that will probably come down over time, and we’ll be able to customize them more to our specific use cases.

There’s always this tension: do you just piggyback on the general thing? How much better does the general model get versus fine-tuning and doing specific things? We’ll have to see how this space plays out, but I think there are going to be a lot of degrees of freedom for companies to take these models and customize and productize them for their area. We’re in the early days of the productization of this technology. It’s going to change a lot, and it’s going to become a lot easier to prioritize.

Krystal Hu: We’re almost approaching the end of our wonderful conversation, but we can take two more questions. One is about how enterprise companies adopt and extract value from ChatGPT. You have seen companies starting to integrate that in their offerings, and on the other side, I think companies, especially highly-regulated banks, were wondering about the information service and privacy issues and banning their employees to play around on company laptops. I’m curious to hear Talia and Ethan’s thoughts on this question.

Talia Goldberg: Across our portfolio, a lot of software companies that may not even be in categories like Intercom, which are really at the forefront, are thinking like, “Hey, how is this important for my business and what are the ways that I might integrate some of these models or ChatGPT APIs into my product?” Highly repetitive tasks can be really great for an AI to help automate or streamline. One of our companies gets a lot of accounting information from their customers, and they need to reconcile and flag if there’s an error or something that is off. And they’ve had these rule-based systems in the past, but you can apply AI and have much better accuracy. Another interesting example is related to the summarization piece. If a customer talks to a call center agent or a sales rep, you can summarize that conversation and create custom marketing collateral just for that person.

Krystal Hu: One last question for Talia and Ethan. People were asking what you were looking for when investing in pre-seed startups or, I guess, startups in general.

“We try to break it down to that key question of, ‘does this really move the needle for some particular role or type of person?'”

Ethan Kurzweil: That’s a great question. There are so many different answers to that. Pre-seed is a little earlier than we usually invest in, to put that disclaimer out – usually, we’re investing in a later seed or series A or B. But our philosophy is to look for hyper-growth models wherever we can find them. And usually, the way we break that down is to try to pre-diagnose through road-mapping, and Talia’s been the one pushing a lot of our thinking around AI and its applications to various different things, and we come up with these roadmaps of different thematic areas we think are pretty interesting. They could be really broad, like cloud computing or the consumerization of healthcare, or narrow, like AI’s impact on customer service.

I would encourage folks to look, because we do a lot of publishing on our blog and social media of our active thesis, to see if what you’re building is aligned with something. And then, generally speaking, we’re looking for, “Does this have the sort of impact that’ll change the way we work or do entertainment or something that could be a paradigm shift in some business process or consumer need?” That’s what we break it down to. We’ve noticed that anytime you have a broad-based change of behavior, that leads to hypergrowth companies and opportunities for startups to disrupt how work or play or whatever was done before. And so we try to break it down to that key question of, “does this really move the needle for some particular role or type of person?”

Krystal Hu: That’s the end of our conversation. For those who haven’t got a chance to try Intercom’s new features, I encourage you to play with the summarization and a few other features yourself. And if you’re interested in the venture space, definitely take a look at Bessemer’s website. As everyone said, six months from now, we’ll be looking back and some of the predictions will come true, and maybe some will be totally different. I hope we’ll have another time to circle back and cover more questions. Thanks again, Talia, Ethan, and Fergal, for your time today.

Ethan Kurzweil: Thanks for having us.

Talia Goldberg: Bye.

Fergal Reid: Thank you so much, everyone. Bye-bye.