How to measure the customer experience as AI scales

CSAT captures less than 10% of your conversations, and the responses you do get skew toward extremes – the delighted and the furious.

The vast majority of customers say nothing at all, especially when they’re busy and would rather move on.

That silence is a blind spot. You’re reporting to leadership and coaching your team based on a sample that doesn’t represent most of your customers. No amount of CSAT tuning fixes a response rate that low.

Chart: Extreme response bias in CSAT data Responses cluster at the extremes of the scale – very dissatisfied or delighted – leaving the neutral middle almost empty.

Coverage isn’t the only limitation, either. CSAT also compresses multiple problems into a single score. A negative rating could be driven by the product, policy frustration, or poor service quality. It alone won’t tell you which. You end up spending more time questioning the data than acting on any of it, and it hides the friction that actually impacts the customer experience: repeated explanations, multiple handoffs, or technically correct answers that still felt frustrating.

As AI handles more conversations end-to-end, the gap only gets bigger. A larger share of your customer experience sits outside direct human review, while CSAT still only reflects the subset of conversations that generate responses.

Teams need to understand what happened in every conversation, rather than just the ones where someone chose to fill in a survey.

What full coverage looks like

AI enables support teams to move from sampling to scoring every conversation, so instead of waiting for customers to tell you how it went, you can automatically evaluate each interaction. When every conversation is visible like this, you get a complete picture of what your customers are actually experiencing across service quality, resolution, and customer effort – the things a single survey was never able to show.

If you’re using Fin, CX Score provides this coverage. It evaluates every interaction (both AI and human), assigns a score from 1–5, and surfaces the reasons behind each score. For most teams, this results in roughly five times more conversation coverage than CSAT alone. If you’re using another solution, the principle is the same: you need AI-powered visibility into every conversation.

Fin CX Score showing % of conversations scored positive, neutral, or negative.

CSAT still has a role as an open door for customers who want to share their experience directly, but it’s the scored view of every conversation that tells you what’s really happening across your entire customer experience, which is especially important as AI scales.

How to set targets

You can’t map old CSAT targets onto a new metric. The coverage is different, so you need to build targets from the data itself.

At Fin, we started by correlating CX Score with operational metrics like first response time and time to close. That gave us useful targets for human support, but we wanted to take a deeper look at how our Agent Fin was performing.

So we broke CX Score down by its underlying attributes: answer quality, customer effort, and product feedback. Fin’s answer quality had the biggest impact on the overall score, which makes sense, since it handles the majority of conversations. That told us where to focus.

With our automation rate roughly at 80%, we modelled what our score would look like if we eliminated low answer quality across both Fin and human conversations. We set initial targets based on those models, historical performance, and team ambition:

Fin support: 80%
Human support: 70%
Overall: 78%

We’ve since raised these targets as performance has continued to improve. The point is to build targets from what the data actually shows when you can see everything, rather than carrying over a threshold from a metric like CSAT which measured something different.

From measurement to action

When every conversation is scored and the reasons behind each score are visible, recurring problems become traceable. You can identify which issues are driving negative scores, how often they’re happening, and whether the root cause sits with support, product, or a specific workflow.

For example, you can see:

Which topics and conversation types are scoring poorly, and why.
How scores differ across channels, and between Agent and human conversations.
Which operational issues are creating friction for customers.

That changes the operational loop. Instead of working from a small number of survey responses and caveating how representative the data is, teams can route issues to the right owner, address them at the source, track whether the fix actually worked, and prevent the same problem from affecting the next customer.

How insights flow back into support

With full visibility into what customers are experiencing, managers can identify patterns that would never surface from individual survey responses – recurring pain points, common friction in handoffs, topics where answer quality is consistently low.

A manager might see that a particular topic is underperforming across the team and use that to update content or run a focused session on how it should be handled. Each pattern leads to a specific action, instead of a vague signal that something might be off.

One caveat here is that scores need context – a high-touch team working complex issues will score differently from a high-volume team handling transactional queries, and that’s expected. You need to compare like with like.

The opportunity in the middle

Historically, quality improvement focused on reducing bad experiences. When you can see every conversation, a different opportunity emerges. You can improve the interactions that aren’t terrible, but aren’t memorable either.

“Fine” is a ceiling you can raise. What’s keeping these middle-ground conversations at a 3, and what would move them to a 4 or 5?

As Jared Ellis from Culture Amp put it:

"We now have this neutral ground where we're not doing something terrible to that customer experience. The customer probably walks away and says, 'yeah, that was fine.' But what we actually want is, 'yeah, that was great.' It's so interesting that we now get that opportunity." – Jared Ellis, Culture Amp

When you can see what’s happening in every conversation, you can go beyond fixing what’s broken and start improving what’s invisible; the fine-but-forgettable middle that no survey would have caught.