Writing effective criteria is what separates a Monitor that surfaces real issues from one that floods your queue with noise. This guide covers best practices for both Monitor flag criteria and Scorecard attribute descriptions. Monitors currently evaluate Fin AI Agent conversations only.
Note: Monitors is available as part of the Pro add-on.
Monitor flag criteria vs. scorecard attribute descriptions
These two types of criteria work differently, so they need to be written differently.
| Monitor flag criteria | Scorecard criteria descriptions |
Purpose | Decides which conversations get reviewed | Defines how each conversation is evaluated |
Logic | Yes/no - each monitor runs independently | Competitive - AI selects the single best match |
Key challenge | Reduce false positives and false negatives | Eliminate overlap between criteria values |
Best practices for writing Monitor flag criteria
Monitors run as independent yes/no checks. Multiple Monitors can flag the same conversation - and that is fine. Because of this, precision matters more than distinction.
1. Describe observable behavior, not inferred intent
Focus on what appears in the conversation.
Avoid: Customer is frustrated
Prefer: Customer uses phrases such as This is unacceptable, I am extremely disappointed, or This is ridiculous.
The AI performs better when evaluating explicit signals rather than emotional interpretations.
2. Include concrete examples
Examples dramatically reduce ambiguity.
Use explicit phrasing patterns: e.g., cancel my subscription, close my account, delete my data
Examples anchor the model to real-world language.
3. Add explicit exclusions
Reducing false positives is critical for Monitors.
Example: Customer uses profanity. EXCLUDE: mild language such as damn or crap. If something should not trigger the monitor, say so clearly.
4. Use quantifiable thresholds
Avoid vague wording.
Bad: Fin gives a short response.
Better: Fin response is fewer than 50 words.
Specific thresholds improve consistency.
5. Break multi-step logic into numbered criteria
If your Monitor depends on sequence or pattern, structure it clearly:
Customer expresses frustration.
Fin responds without acknowledging emotion.
Customer repeats complaint.
This makes the logic deterministic and easier to evaluate.
6. Keep it simple
If the rule is straightforward, do not overcomplicate it.
Example: Fin suggests next steps (e.g., Please try clearing your cache, Log out and back in, Click this link).
Clarity beats complexity.
7. Use 'explicitly' to require direct customer language
If your Monitor should only trigger when a customer directly states something — not just implies it — include the word "explicitly" in your criteria. Without it, the AI may infer intent from context and match conversations where the behavior was only suggested, not stated.
Without "explicitly": Customer requests a call back — could match "Can you connect me to the security team?" since the AI may infer this implies a request for direct contact.
With "explicitly": Customer explicitly requests a call back — only matches if the customer directly asks, e.g., "Can I get a call?" or "Please call me."
Tip: Use the Test Monitor tool to validate your criteria against real conversations before turning it on. Update the flag criteria and rerun the test until results accurately reflect what you want the Monitor to capture.
Best practices for writing scorecard criteria descriptions
Start with the core principle: criteria compete. The AI looks at the full list and selects the single best match for each criteria. Your job is to make that choice obvious.
1. Use clear, concise names
Keep names short and specific. Someone reading the list should immediately understand the purpose without opening the description.
Bad: Customer Communication Issues
Better: Tone - Rude or Dismissive
2. Write comprehensive descriptions
Descriptions carry most of the classification signal.
Explicitly describe all conversation types that belong.
Include keywords, common phrasings, and examples.
Think through edge cases and include them.
Clarify what good and bad instances look like.
The description should make it easy for the AI to recognise real-world phrasing, not just abstract definitions.
3. Make criteria clearly distinct
Criteria within the same scorecard should not compete conceptually.
Avoid semantic overlap.
Ensure each attribute has a clear boundary.
If two attributes could reasonably apply for the same reason, refine one of them.
It is fine if a single conversation fits multiple criteria across the scorecard. What matters is that within each criteria set, the values are clearly separable.
4. Evaluate quality systematically
When reviewing your taxonomy, assess each criteria on:
Clarity / conciseness
Description comprehensiveness
Criteria distinction
Overlapping criteria (if any)
Final score + commentary
This structured review forces you to tighten definitions and reduce ambiguity - which directly improves classification performance.
FAQs
How long should my flag criteria be?
How long should my flag criteria be?
There is no fixed length - the right length is whatever it takes to describe the behavior precisely. A simple Monitor might only need two or three sentences. A complex one (like detecting multi-step failure patterns) may need a structured, numbered description. Err on the side of more detail rather than less.
Can I use the same scorecard criteria across multiple scorecards?
Can I use the same scorecard criteria across multiple scorecards?
Yes - criteria titles and descriptions are reusable. Once you have created criteria, you can add it to multiple scorecards. Note that previous rating scores cannot be reused and will need to be set from scratch in each scorecard.
What is the difference between monitor flag criteria and a scorecard criteria description?
What is the difference between monitor flag criteria and a scorecard criteria description?
Monitor flag criteria determines whether a conversation gets pulled into a Monitor at all - it is a yes/no filter. Scorecard criteria descriptions define how each conversation is scored once it is in the Monitor. Think of the Monitor as the net and the scorecard as the ruler.
Need more help? Get support from our Community Forum
Find answers and get help from Intercom Support and Community Experts
