Operator: A look under the hood

We just launched Operator, an Agent for your customer operations that helps you understand, manage, and improve your entire customer experience.

To give you an idea of how powerful this Agent is, we’re sharing more about its technical infrastructure and the engineering decisions that went into ensuring Operator works reliably at production scale across thousands of customer workspaces.

If you’re a technical leader evaluating whether to build something like this yourself, or trying to understand the difference between a well-prompted LLM and a production Agent system, this is for you.

Escaping the “it’s just an LLM” trap

Most engineering teams that evaluate this space start the same way: a prototype. Take a foundation model, give it API access to your support data, add a system prompt with some domain context, and you’ve got something that queries your database, summarizes tickets, and generates reports that look right. It demos convincingly.

The problem with that prototype is that it obscures the scope of what’s actually required. It demonstrates the 10% of the system that’s straightforward to build, and it’s easy to assume the rest is just as straightforward. It isn’t. The gap between a working demo and a production system your team depends on daily is where most of the engineering investment lives.

With Operator, we’ve invested deeply in every layer: tooling, reasoning, how the Agent takes action, and the infrastructure that makes it reliable at scale. Here’s a closer look.

The tooling layer

The first thing we had to confront was that the obvious approach (giving a model access to your APIs and letting it figure things out) doesn’t hold up in production. The model makes reasonable decisions for simple queries, but operating across thousands of customer workspaces with different configurations, data models, and usage patterns, a “figure it out” approach isn’t nearly precise enough.

What you need is purpose-built tooling: tools that encode decisions about what data to fetch, how to structure it, what context to include, and what to leave out. Operator has over 50 of these tools and 10 skills.

A tool is a single action that Operator takes (search content, run a query, look up a conversation). A skill chains multiple tools together to complete a whole job, like debugging a conversation end-to-end, rolling out a content update across an entire help center, and identifying the next automation opportunity.

The difference between using thin wrappers around API endpoints and purpose-built tooling shows up in something as seemingly simple as a performance question. When you ask “how did Fin perform last week?”, a naive implementation runs a query and hands back a table. Operator runs a reporting tool that determines which metrics are relevant for your specific workspace, which are meaningful for your particular question, and what the numbers actually mean in context, giving you a much richer answer that you can do something tangible with.

Developing that behavior took months of engineering. Not because any individual piece is conceptually hard, but because getting it right across the full range of customer workspaces, configurations, and edge cases is an iterative process. You build it, you test it against real conversations, you find the cases where it breaks, you fix those, and you repeat. There’s no shortcut.

The intelligence layer

The tooling layer solves what to do, but beneath it is a harder problem: understanding what’s worth doing, and why. This is the layer that makes Operator understand your business rather than just query it. Three components go into it:

1. Semantic search

Unlike solutions that rely on keyword matching, Operator uses a system that understands what content is about, not just what words it contains. When it searches your help center, it’s using the same semantic search engine we’ve spent years optimizing for Fin itself. This is a retrieval system that’s been tuned against millions of real support conversations, with precision and recall characteristics we’ve measured and improved continuously.

2. Attribute awareness

Operator has access to your data and knows what is meaningful for different questions. It knows which metrics are actually in use in your workspace, which custom attributes carry signals, and which fields are populated versus effectively empty. We’ve built specific skills that give Operator this meta-knowledge, so when it’s investigating a performance question, it’s looking at the right things, not hallucinating insights from sparse data.

3. Intelligent reasoning

A well-built Agent can answer your question and anticipate what you should ask next. If you ask Operator about escalations spiking, it doesn’t just say, “escalations increased 23% week-over-week.” It’ll continue on to tell you why this happened by examining the escalated conversations and identifying that a disproportionate number involved a specific product area, before moving on to check whether the relevant help content is up to date, and, if it isn’t, proposing an update.

That chain of reasoning isn’t prompt engineering. It’s encoded in the skills we’ve built, refined against the patterns we see across our entire customer base.

The action layer

This is where the engineering complexity increases by an order of magnitude because instead of just analyzing problems and recommending solutions, Operator takes action to solve them itself. It can update Guidance rules, draft and publish help articles, create Procedures, configure data connectors, and modify your Fin configuration.

Every one of these actions has to be safe, reversible, and auditable. An analytics tool that occasionally returns a wrong number is frustrating. but an Agent that occasionally applies a wrong configuration change to a live support system is a different category of problem.

To prevent this, we built a robust proposal system, whereby every change Operator suggests is presented as a reviewable diff. You see exactly what will change before anything is applied, with the option to accept, reject, or refine. Nothing goes live without your explicit approval.

What else sets Operator apart

Beyond the technical complexities that power Operator behind the scenes, we’ve also worked hard to build a great user experience.

A UI that’s both conversational and graphical, not one or the other

Operator blends conversational interaction with purpose-built graphical components:

Proposal diffs that show exactly what will change in an article.
Inline charts that visualize performance trends.
Dashboards that render directly inside the conversation thread.

This means that when a knowledge manager reviews a proposed content update, they see a structured diff, not a wall of LLM-generated text. When a team lead asks about weekly performance, they get a chart with clear axes and context, rather than a paragraph approximating the data in prose.

Building this kind of hybrid UI is extremely difficult outside of a native platform integration. In a chat interface or CLI, you’re limited to text output; in a standalone dashboard, you lose conversational context.

Operator does both in the same thread, so every interaction is detailed and context-rich.

It lives where your team already works

Operator is built into the same platform your team uses every day. It’s not a separate tool with a separate login, nor is it a Slack bot your engineer set up that only three people know about. It operates exactly where you are, alongside the conversations, help center articles, workflows, and data you’re working with.

This helps close the distance between resolving a problem and resolving it: when your knowledge manager spots an outdated article while reviewing a Fin conversation, Operator can surface the fix in the same session. When a team lead notices an escalation spike in the morning, they can ask Operator to investigate without switching tools, waiting for a data pull, or filing a ticket with your engineering team.

A custom-built tool will always live outside the workflow. An engineer builds it, maintains it, and often, is the only one who knows how to use it. Operator is accessible to anyone who can type a question in plain language, which turns it into a system your whole team runs on.

The compounding advantage

Every customer using Operator teaches us something. We see which debugging approaches work across different types of support operations, learn which content structures perform better, and can identify automation strategies that consistently land. Those patterns get encoded back into Operator’s skills and tools.

When we discover that a particular sequence of investigation steps reliably identifies the root cause of a spike in escalations, we build that into Operator’s diagnostic skill. When we find that a specific way of structuring help articles leads to higher Fin resolution rates, we encode that into the content creation skill. Our engineering team is continuously shipping improvements based on what we observe across the entire customer base.

A custom-built solution gives you exactly what you built, meaning it doesn’t get smarter unless you invest engineering resources into making it smarter. And that means resources not spent on your core product.

We’re not locking the door

Some teams want to build their own Agents. Some of our most technical customers do this. But when you do, you’re working with raw APIs and building your own tooling on top of them. When you use Operator, you’re working with a system that already knows what questions to ask, understands your data, and encodes the best practices we’ve learned from thousands of support teams.

We recently launched the Fin CLI, which means you can use third-party agents like Claude Code or Cursor to interact with your Fin data and configuration. That door is open. What we hope this post has clarified is everything that goes into the build of Operator:

Over 50 tools and 10 skills, purpose-built for support operations.
Years of investment in semantic search.
Deep integration with every layer of Fin’s stack.
The proposal system.
The intelligence layer.
The reliability infrastructure.

If you’d still like to move ahead with building a custom solution, here’s an honest assessment:

You can build a useful read-only tool in weeks. It’ll query your data, summarize tickets, and generate reports, but turning it into a production system will take quarters. Reliability, security, edge case handling, multi-tenant data isolation, and graceful degradation are all important architectural decisions that you’ll need to get right from the start.

The action layer is also where you might risk stalling out. Going from “here’s what’s wrong” to safely making changes in a production system is a fundamentally different engineering problem than analysis. Most DIY projects never get there.

Finally, you’ll be maintaining it forever. Every model upgrade, API change, and new capability in your support platform means updating your custom tooling. We have a team dedicated to this. You’ll need one too.

Our CTO Darragh Curran wrote an in-depth post about the pros and cons of building vs buying your own AI Agent, which is worth a read. The economics still favor buying when a vendor has invested more in the problem than you can justify internally. What I hope this post adds is a clearer picture of what that investment actually looks like from an engineering perspective.

The investment is ongoing. The problems we’re solving at the infrastructure level today are harder than the ones we solved a year ago, and that trajectory isn’t slowing down.