The world of AI is moving fast, and here at Intercom, we’re helping set that pace. Today, we’re delighted to introduce Fin, our new chatbot powered by OpenAI’s GPT-4 and Intercom’s proprietary machine learning technology.
Just a few weeks ago, we announced our first GPT-powered features in Intercom – a range of useful tools for customer service reps – and our customers have been really enjoying the extra efficiency these features deliver.
The big goal, though, was creating a GPT-powered chatbot that could answer customer queries directly. To do this, it needed to be able to harness the power of large language models but without the drawbacks posed by “hallucinations”. Initially, we weren’t sure how long it would take to crack this problem, but now, with the release of GPT-4 by OpenAI, we can reveal that we’ve built a chatbot that can reliably answer customer questions to a high standard. We’ve called it Fin.
In today’s episode of the Inside Intercom podcast, I sat down with our Director of Machine Learning, Fergal Reid, to discuss our new AI chatbot, how we built it, what it does, and what the next steps look like for this remarkable breakthrough.
Here are some of the key takeaways:
- Our new AI chatbot can converse naturally using the latest GPT technology.
- Fin ingests the information from your existing help center and uses only that knowledge, giving you control over how it answers questions about your business.
- At Intercom, we believe that the future of support is a mix of bots and humans. Fin won’t be able to answer all customer queries, and in those situations, it can pass harder questions to human support teams seamlessly.
- We’ve reduced hallucinations by about 10x, building constraints that limit Fin to queries relating to your business, based on a knowledge base you trust.
If you enjoy our discussion, check out more episodes of our podcast. You can follow on Apple Podcasts, Spotify, YouTube or grab the RSS feed in your player of choice. What follows is a lightly edited transcript of the episode.
A bot by any other name
Des Traynor: Welcome to an exciting episode of the Intercom podcast. I’m once again joined by Fergal Reid, our Director of Machine Learning, and he’s gonna tell us about the launch of something that we’ve been asked for pretty much every day since ChatGPT launched.
“This will actually be a bot that you can use for your business that has the natural language processing capability of ChatGPT but will answer questions specifically about your business”
Fergal Reid: Yeah, thanks Des. Ever since ChatGPT came out, people were like, ‘Hey, can I use that to answer my support questions for my business?’ And we were always like, ‘Oh, we don’t know. We’re not sure about the hallucinations.’ But today I think we’re really excited to announce we’re really excited about this product because we think we’ve done it. We think we’ve built something – this will actually be a bot that you can use for your business that has the natural language processing capability of ChatGPT but will answer questions specifically about your business and we’ve built it using your help center so it won’t answer questions randomly from around the internet or anything. You can control what it says. The accuracy rate’s gone way up. We’ve managed to get the accuracy rate up a lot through using OpenAI’s new GPT-4 model and which you have access to in beta. So I’m really excited about this.
Des: So the idea is that what people have experienced and kind of fallen in love with in ChatGPT, which is effectively this bot you can ask anything to and it gives a good stab at answering. You can do that for your business?
Fergal: Yes. Sort of. So we’ve deliberately made it so you can’t ask it anything. The idea is to build something that has the same sort of conversational understanding that we’ve seen with ChatGPT but that specifically only answers questions about your business. You can ask it something wild like, who was the 22nd president of America? And it’ll be like, ‘Hey, I’m only here to answer customer support questions about this specific business.’
Des: Cool. So it actually knows effectively what it should and shouldn’t attempt?
Fergal: Yeah, exactly. That’s the idea.
A bot breakthrough
Des: I feel like seven or eight weeks ago you said we weren’t going to do this because it wasn’t possible or wasn’t going to be easy or something like that?
“Every customer was asking us about it”
Fergal: So, six or seven weeks ago, when we started looking at this technology, initially when we looked at it first we were like, ‘Wow, can we build this? Can we build ChatGPT for your business?’ That was top of everyone’s minds. Every customer was asking us about it. We were kind of looking at it and we were going, gosh, this hallucinates a lot, this will give you inaccurate results. Wildly inaccurate results, totally made up things, we were like ‘It’s a very exciting technology, but we’re not sure if we can actually constrain it and stop it hallucinating enough. And we spent a lot of time playing with GPT, ChatGPT, GPT-3.5.
“When we started playing with it, we thought, wow, this seems a lot better. It can still hallucinate sometimes, but it hallucinates a lot less, maybe 10 times less”
We could just never get it to know when it doesn’t know something. But recently we’ve got access to a new beta from OpenAI of their new GPT-4 model. And one of the things they told us was, ‘Hey, this is designed to hallucinate a lot less than some of the other models we’ve seen in the past.’ And so, you know, we were like, ‘Wow, that sounds very interesting. That sounds very exciting, GPT-4, what’s it gonna do?’ And we spun up an effort to kind of, to look at this and to put this through some of our our test beds to check and examine for hallucinations. And when we started playing with it, we thought, wow, this seems a lot better. It can still hallucinate sometimes, but it hallucinates a lot less, maybe 10 times less, something like that. And so we were extremely excited. We were like, ‘Wow, okay, this suddenly feels like this is something. This is good enough to build a bot with, this is a generation ahead of the GPT-3.5 we’re using. It’s just a lot further along, in terms of how trustworthy it is.
Des: Exciting. What does the test do – are there torture tests that we put these bots through to see exactly whether they know they’re bullshitting, basically?
Fergal: So we’re not that far along. For our previous generation of models, for example for resolution bot, we had this really, really well developed set of battle-tested, test benchmarks that we’d built over years. All this new technology is months old, so we’re not quite as principled as that. But we have identified a bunch of edge cases, just specific things. We’ve got a spreadsheet where we keep track of specific types of failure modes that we’re seeing with these new models. And so when GPT-4 came along, you’re like, okay, let, let’s try this out. Let’s see what happens when you ask it a question that isn’t contained in an article or a knowledge base at all. Or you ask it a question that is similar, but not entirely the same as what’s actually there.
And you know, with GPT-3.5 and with ChatGPT, if it doesn’t know something, it’s almost like it wants to please you, to give you what you want. And so it just makes something up. And with GPT-4, they obviously have done a bunch of work on reducing that. And that’s just really obvious to us. So when we put it kind of through our tests, it’s possible to get it to say, ‘I don’t know’, or to express uncertainty a lot more. That was a real game changer for us.
“At Intercom, we believe that the future of support is a mix of bots and humans”
Des: And if the bot doesn’t know, can it hand over to a human?
Fergal: Absolutely. At Intercom, we believe that the future of support is a mix of bots and humans. We’ve a lot of experience with resolution bot of making a nice handover from the bot to the human support rep, hopefully getting that support rep ahead of the conversation and we think we still need to do that with this bot. There will always be issues where, say, someone’s asking for a refund. Maybe you want a human to approve that. So there’s always gonna have to be a human approval path. At Intercom we’ve got a really good platform around workflows and you’re going to be able to use that to control when the bot hands over and how it hands over. We’ll make sure that this new bot integrates with our existing platform just the same way that our existing bot did.
Des: And I presume the bot will have disambiguated or qualified a query in some way, perhaps summarized it, even as it hands it over?
Fergal: We don’t have any summarization feature in there at the moment, but the bot will attempt to disambiguate and draw out a customer response. Our existing resolution bot does a little bit of that. This new bot, because it’s so much better at natural language processing, can just do that more effectively. So that may mean that the handling time goes down for your rep, for the questions that the bot has touched. So yeah, pretty excited about that too.
The art of conversation
Des: Listeners to our Intercom On Product podcast would know I’m often fond of saying that having a capability, even a novel capability that’s useful, isn’t enough to have a great product. How have you wrapped a product around – what were your goals? What are the design goals for building an actual product around this GPT-4 capability?
Fergal: So we realized pretty early on that there was really a set of design goals that we were trying to head towards. First and foremost, we wanted to capture the natural language understanding that people saw and were very impressed with, with ChatGPT. We wanted to get a generation above what was, what was there before. So if you ask a pretty complicated question or you ask one question, then you ask a follow-on question, that it understands that the second question is to be interpreted in light of the one before. Our previous bot didn’t do that. And most bots out there, they just don’t do that. That was just too hard. You know, conversations are very tricky environments for machine learning algorithms. There’s a lot of subtlety and interplay and sort of a support conversation, but this new tech seems to do great at that. So our first goal is to capture that.
“There’s a lot of subtlety and interplay and sort of a support conversation, but this new tech seems to do great at that”
Des: So as an example of that, you might ask a question and say ‘Do you have an Android app? Well what about iPhone?’ Like to, to ask, ‘What about iPhone?’ makes no sense unless you’ve previously parsed it with, ‘Do you have an Android app?’, as an example. So it’s about gluing things together to understand the conversational continuity and context.
Fergal: Exactly. And with that, it just flows more naturally. We specifically notice this with the new bot when you ask it a question and you get an answer and it’s not exactly what you asked, you can just be like, ‘Oh, but no, I really meant to ask for pricing.’ And it kind of understands that and it’ll give you the more relevant answer. We feel as though that’s a real breakthrough technology.
Des: Can it push back on you and say, ‘Say more?’ Can it ask you follow-on questions to qualify your questions? So if you come up with something vague, like, ‘Hey, does this thing work?’ Would it try to solve that? Or would it respond with, ‘I need more than that.’
“To actually to build a good product experience, it’s almost like we’ve got loads of flexibility and loads of power but now what we need is the ability to limit it and to control it”
Fergal: So, natively, the algorithms will do a certain amount of that, but with this sort of technology, we get this very advanced capability and then actually what we’re trying to do is we’re trying to constrain it a lot. We’re trying to actually say, ‘Okay, you can do all this out of the box, but we need more control.’ To actually – like you alluded to earlier – to build a good product experience, it’s almost like we’ve got loads of flexibility and loads of power but now what we need is the ability to limit it and to control it. So we’ve built experiences like that. We’ve built a disambiguation experience where, if you ask a question and there isn’t enough information, we have it try to clarify that, but we control it.
We’ve engineered prompts where you have special purpose applications with the technology to do each task in the conversation. So we’ve one prompt to get you to ask a question; another one to disambiguate a question; another one to check to see if a question was fully answered for you. And so we start off with this very powerful language model, but we really just want to use it as a building block. We want to control it. We achieve that control by breaking it up into special purpose modules that do each thing separately.
With great product comes great responsibility
Des: So at a foundational level, we’re saying it can converse naturally. The biggest advantage of that, to my mind, as a product is that you’ll be comfortable putting it as the first line of solution in front of your customers. I was gonna say defense, but it’s not a military operation. But you’d be comfortable putting it out there as if to say, ‘Hey, most conversations go here.’ And the fact that it can have a back and forth, it can maintain context, it can disambiguate means it’s well-equipped to do that. What else did you add in? It’s not just sitting there to chat – so what else does it do?
Fergal: The first thing I would say is, different businesses are probably gonna have different levels of comfort in terms of how they deploy this. Some people are probably gonna say, ‘Well, I’ve got a really great help center’, and this bot that we’ve built, it draws all its information from your help center. I’ll come back to that. But some people might say, ‘I have a really good help center. It’s very well curated. I’ve put a lot of articles in there over time, and I want to have the bot dialogue and answer all those questions.’ There will be other customers want the bot to come in more opportunistically and bow out [itself], and we’re working on building settings to enable people to control their level of comfort with that.
Des: Some sort of threshold for when the bot should jump in.
“We’re integrating the bot with all our existing workflows to help you get that control about when you want it to come in and, more importantly, when you want it to leave so you can hand over to your existing support team when it’s reached its end”
Fergal: Exactly. And at the moment we have a pretty big workflows capability that you can use. And we’re integrating the bot with all our existing workflows to help you get that control about when you want it to come in and, more importantly, when you want it to leave so you can hand over to your existing support team when it’s reached its end.
Des: So if there are no support agents online, or if the user’s free, just send them straight to the bot. If it’s a VIP customer and agents are sitting idle, send them straight to the agent.
Fergal: Exactly. So what we’re trying to do here is take this new technology and then integrate it with our existing platform, which has all those features that people need in order to build what would be considered industry-standard bot deployment.
“The next major design goal we had was to avoid hallucinations”
So the next major design goal we had was to avoid hallucinations. We’ve talked about reducing hallucinations and how it was a design goal of ours to have the bot converse naturally. But we really wanted to give our customers control over the sort of questions it could answer. Now these bots, this new AI technology, you get access to a large language model and it’s been trained on the entire text of the internet. So it has all that knowledge in there. And one way – kind of the simplest way – to deploy this is to be like, ‘Hey, I’m just gonna have the bot answer questions using all of its information about the internet.’ But the problem with that is that if it doesn’t know something, it can make it up. Or if it does know something, maybe you don’t want it talking to your customers about a potentially sensitive topic that you know it has information about. You might think, ‘I’m not sure how my business or my brand feels about, you know, whatever information, it got off some weird website. I don’t want it having that conversation with my customer.’
“We’ve done a lot of work to use the large language model to be conversational; to use it to understand a help center article you have; but to constrain it to only giving information that’s in an actual help center article that you control and that you can update and you can change and you can edit”
So we’ve done a lot of work to use the large language model to be conversational; to use it to understand a help center article you have; but to constrain it to only giving information that’s in an actual help center article that you control and that you can update and you can change and you can edit. And so that was a major design goal for us, to try and make this bot trustworthy, to take the large language models, but to build a bot that’s constrained to using them to just answer questions about your business and about your business’s help center.
That was a lot of work, and we’re very proud of that. We think that we’ve got something that’s really good because you get that conversational piece. You get the AI model’s intelligence to get an actual answer from a help center article, but it’s constrained. So it’s not gonna go and start having random conversations with end users.
These bots, these models, it’s always possible – if you jailbreak them – to kind of trick them into saying something that’s off-brand or that you wouldn’t want. And that’s probably still possible, but we really feel we got to a point where that would require a determined hacking attempt to really make that work. It’s not just going to go radically off-script in normal conversations.
I think one thing that is very important to clarify is that these large language models are probabilistic. Hallucinations have decreased a lot and we think it’s now acceptable for many businesses, but it’s not zero. They will occasionally give irrelevant information. They’ll occasionally give incorrect information where they read your help center article, they didn’t fully understand, and so they answer a question wrong. Possibly a support agent will make mistakes too…
Des: Humans have been known to…
Fergal: Humans occasionally have been known to make mistakes, too. And so, these bots, you know, it’s a new era of technology. It’s got a different trade-off than what we had before. Possibly some customers of ours will be like, ‘I want to wait. I don’t want to deploy this just yet.’ But we think that, for many, many customers, this will cross the threshold, where the benefit of [being able to say] ‘I don’t need to do the curation, I don’t need to do the setup that I’ve had to do in the past with resolution bot, I can just turn this on, on day one, and suddenly all the knowledge that’s in my help center, the bot has it, the bot can try and answer questions with it.’ It won’t get it perfect, but it’ll be fast. We think that that’s going to be a worthwhile trade-off for a lot of businesses.
Des: In terms of setup, if you’re a customer with a good knowledge base, how long does it take you to go from that to a good bot? How much training’s involved? How much configuration?
Fergal: Very little time at all. Basically no training. You can just take the new system we’ve built and you can point it at your existing help center. It’s a little bit of processing time where we have to kind of pull it in and scrape it and get the articles ready for serving.
Des: Minutes? Seconds?
Fergal: We’re still working on that. We’re in minutes right now, but we think – maybe by the time this airs – it’ll be down a lot lower than that. There’s no hard engineering bottleneck to making that very, very low. And so we’re very excited about that.
A product summary
Des: So in summary, give us the bullet points of this product. What should we tell the market about it?
“It’ll talk with you in a natural way, like you’ve seen with ChatGPT. The second thing is that you, as a business, can control what it says”
Fergal: The first thing I would say is that it’ll talk with you in a natural way, like you’ve seen with ChatGPT. The second thing is that you, as a business, can control what it says. You can limit the things it will talk about to the contents of your knowledge base. The third thing I would say is that hallucinations are way down from where they were. And the fourth thing I would say is that this is really easy to set up. You just take this, you point it to your existing knowledge set and you don’t need to do a whole bunch of curation.
Des: Because we’re Intercom, we’re not likely to chat shit and engage in a load of hype without at least some qualifications. What areas are we still working to improve?
Fergal: I guess the first thing I would say is that the accuracy piece is not perfect. This is a new type of technology. It’s a new type of software engineering trade-off. So, with resolution bot, resolution bot would sometimes come and give an irrelevant answer, but you could always kind of figure out what it was talking about, you could say, ‘That’s not quite relevant.’ This is a little bit different. This will sometimes give irrelevant answers, but it may also sometimes give incorrect answers. It may have just misunderstood the information in your knowledge base. A specific example of this is sometimes, say, if you have a list of times something can happen and a user asks [the bot], it might assume that list is exhaustive. It might assume that that list was all the times and then it will surmise, ‘Oh no, it wasn’t in the list in the article. So I’m gonna say the answer is no, it can’t happen. This thing cannot happen this other time.’
Des: So, you might have a knowledge-based article that cites examples of when we will not refund your payment, with a list of two or three examples. And the language model will read that and conclude that there are three conditions under which this happens. And it’s making a mistake, in not seeing that these are just demonstrative examples, rather than the exhaustive list. Is that what you mean?
Fergal: Exactly. Its general knowledge and its general understanding are still a little bit limited here. So it can look at lists of things and make assumptions that are in the neighborhood of being okay, but not quite right. Yeah. So, most of the time, when we see it make an error, the error seems fairly reasonable, but still wrong. But you need to be okay with that. That’s a limitation. You need to be okay with the idea that sometimes it might give answers that are slightly wrong.
“We’re building this experience where you can take your existing help center and very quickly get access to a demo of the bot, pre-purchase, to play with it yourself and understand how well this works for your specific help center”
Des: Is it quantifiable? My guess is that it’s not because it’ll be different per question, per knowledge base, per customer, per acceptability… So, when someone says, ‘Hey, how good is the bot?’, how do you best answer that?
Fergal: The best thing to do is to go and play with a demo of it on your own help center. We’re building this experience where you can take your existing help center and very quickly get access to a demo of the bot, pre-purchase, to play with it yourself and understand how well this works for your specific help center.
Des: And you’d suggest, say, to replay the last 20 conversations you had, or your most common support queries? How does any individual make an informed decision? Because I’m sure it’ll do the whole, ‘Hello? Are you a bot?’ ‘Yes, I am’ thing.
Fergal: We think that, just by interacting with it, you can very quickly get an idea of the level of accuracy. If you ask your top 20 questions, the type of questions people ask you day in, day out… you probe around those, you ask for clarification. You get a pretty good sense of where this is good and where the breaking points are. For us, this is an amazing new product and we’re really excited about it – but it’s still generation one. We’re going to now improve all the machine learning pieces. We’ll improve all those measurement pieces over time as well.
Des: With resolution bot one, our previous bot, you would train it – so you would say, ‘Hey, that’s the wrong answer. Here’s what I want you to say’, et cetera. You’re not doing that this time around. So if you detect it giving an imprecise answer, or think it could do better, what’s the best thing to do? Do you write a better article? Do you look at its source?
Fergal: It’s still early days here and we probably will build features to allow you to have more fine control over it. But right now, the answer to that question is, ‘Hey, can you make your knowledge base article clearer?’ Actually, developing this bot, we have seen that there are a lot of ambiguous knowledge-based articles out there in the world, where little bits of it could be clearer.
Des: What other areas do you think will evolve over the coming months?
Fergal: There’s a lot of work to do on our end. We’ve got version one at the moment. To improve it, we want to get it live with customers, we want to get actual feedback, based on usage. Any machine learning product I’ve ever worked on, there’s always a ton of iteration and a ton of improvement to do over time. We also want to improve the level of integration with our existing resolution bot. Our existing resolution bot requires that curation, but if you do that curation, it’s excellent. It can do things like take actions. You can wire it up to your API so that it realizes someone’s asking about reselling a password and will actually go and trigger that password reset.
“The last piece that I’m extremely excited about is this idea that we can take this new AI technology and use it to generate dramatically more support content than we’ve been able to in the past. Very quickly, this new bot, if the content’s in your help center, it’ll be able to answer using your content”
It’s really important for us that this kind of next-generation bot is able to do all those things as well. So initially it’ll be like, ‘Hey, answer informational questions from your knowledge base.’ Zero setup day one – get it live, it’s great. But eventually – and we’ve seen this in every piece of research we’ve done – you want to get to the next level. After that, people will want the ability to use that technology and capability we already have to take actions to resolve queries. And we’re excited that we might see that a lot more built on this next-generation, language-based platform.
Then, the last piece that I’m extremely excited about is this idea that we can take this new AI technology and use it to generate dramatically more support content than we’ve been able to in the past. Very quickly, this new bot, if the content’s in your help center, it’ll be able to answer using your content. And we think that’s great. There are a lot of people who are able to write help center articles who would’ve gotten stuck trying to curate bots or intents in the past. So we’re very excited about that. But we think there’s new tooling to build here, to make it dramatically easier to write that help center article content. For example, taking your support conversations and using this next-generation AI to bootstrap that process.
Des: So one vision we spoke about maybe only two months ago was the idea that the support team would be answering questions that are… I think, at the time, I said, to be answering your questions for the first time and the last time. So if a question makes its way through, it’s because we haven’t seen it before. And once we have seen it, we don’t see it again. Is that how you see that happening?
“We think we can see a path to that where we can have a curation experience that is simple enough that a support rep in an inbox can just finish answering a conversation and be like, ‘Yes, I approved this answer to go into the bot'”
Fergal: I think, for the first time, I can see a path to that. When we shifted to resolution bot 1.0, the feature request we were always getting was, ‘Hey, can I have my support rep in the inbox? Can I have them answer a question and then just put that question in the bot?’ And any time we tried to do that, it didn’t work because putting a question and curating a question to be good enough to design an intent was just a lot of work. Across the industry, there are a lot of different support bots. I haven’t ever seen anyone who’s managed to nail this and make that really work. But now with the advanced large language models, we think we can see a path to that where we can have a curation experience that is simple enough that a support rep in an inbox can just finish answering a conversation and be like, ‘Yes, I approved this answer to go into the bot.’
There has to be some human approval because it can’t be that Fergal asks the bot, ‘Hey, what’s Des’ credit card number? The bot will be like, ‘Well, I know the answer to that because it was in this other conversation Des is in.’ That would be unacceptable. There has to be some approval step between private conversations and durable support knowledge. But we think we see a path to a much better approval process there than we’ve ever had before. And potentially a world where maybe not every issue, but for a lot issues, they can be answered only once. We think there’s something cool coming there.
Des: Awesome. Well it’s an exciting release – is it available to everyone?
Fergal: This is just heading towards private beta at the moment, with the new release of GPT-4 from OpenAI.
Des: Exciting. Well, I’ll check in a few weeks and see how it’s going.
Fergal: Yeah. Exciting times.