Auto-classifying customer support tickets with AI and routing them to the right team
Have AI read each incoming support ticket, figure out what it's about (billing, technical, sales, cancellation etc.), set its priority and send it to the right team. In the POC playbook a model emerged as a ship candidate; here we make it production-ready: hybrid flow (simple rule + cheap model + strong-model fallback), critical-ticket escalation and weekly drift monitoring. All copy-paste Python.
gpt-4o-mini) handles ~75%, (3) strong
fallback (claude-opus-4-7) catches the ~10% hard cases, (4) a
rule forces critical tickets to a human. Cost goes down, quality goes up.
1. Production prompt + JSON schema
The loose POC prompt won't do. In production the model must produce required fields, an enum category and zero error margin. JSON schema validation catches malformed output and routes it to a human.
# routing_prompt.txt — production prompt
Classify the following support ticket.
Return valid JSON only, no extra text.
Categories:
- billing (payment, invoice, refund)
- technical (integration, error, API)
- sales (pre-sale, pricing)
- bug (software defect)
- cancellation (account close, subscription stop)
- other
Ticket:
{{ticket_text}}
JSON schema:
{
"category": "billing|technical|sales|bug|cancellation|other",
"priority": "low|medium|high",
"team": "billing_support|tech_support|sales|core_eng|account_ops|triage",
"confidence": 0.0,
"escalation_required": true,
"rationale": "short reason, max 80 chars"
}
The Parel API supports response_format: json_object; it forces the
model away from non-JSON output (works on gpt-4o-mini, qwen3-max and
claude-opus-4-7).
2. Hybrid routing: rule + two models
Sending all traffic to one model is both slow and expensive. A three-layer router:
| Layer | Trigger | Model | Approx % traffic |
|---|---|---|---|
| 1. Rule | Hotwords like "refund", "chargeback" | — | 15% |
| 2. Cheap | Default flow | gpt-4o-mini | 75% |
| 3. Fallback | confidence < 0.75 | claude-opus-4-7 | 10% |
# router.py — hybrid routing: rule + cheap model + strong fallback
import json
from openai import OpenAI
client = OpenAI(api_key=PAREL_API_KEY, base_url="https://api.parel.cloud/v1")
CONFIDENCE_THRESHOLD = 0.75
PROMPT = open("routing_prompt.txt").read()
def classify(model, ticket):
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "Return valid JSON only."},
{"role": "user", "content": PROMPT.replace("{{ticket_text}}", ticket)},
],
temperature=0,
response_format={"type": "json_object"},
)
return json.loads(response.choices[0].message.content)
def route(ticket: str) -> dict:
# Layer 1: rule-based hotwords (fastest, free)
lower = ticket.lower()
if any(w in lower for w in ["refund", "chargeback", "money back"]):
return {"category": "billing", "team": "billing_support", "via": "rule"}
# Layer 2: cheap model (gpt-4o-mini) — handles ~85% of cases
decision = classify("gpt-4o-mini", ticket)
if decision["confidence"] >= CONFIDENCE_THRESHOLD:
decision["via"] = "gpt-4o-mini"
return decision
# Layer 3: strong model fallback — only hard cases
decision = classify("claude-opus-4-7", ticket)
decision["via"] = "claude-opus-4-7"
return decision Result: average cost drops (the 10× expensive model only sees 10% of traffic), accuracy rises (hard cases handled by the strong model), p95 latency stays stable (cheap model handles 75%).
3. Escalation guardrail
Some tickets must never be left to the model. Three rules force them straight to a human:
# escalation.py — force critical tickets to a human, not the model
HIGH_PRIORITY_KEYWORDS = ["urgent", "asap", "production down", "data loss", "gdpr", "kvkk"]
def needs_human_override(ticket: str, decision: dict) -> bool:
# 1. Model itself flagged it
if decision.get("escalation_required"):
return True
# 2. High priority and low confidence
if decision.get("priority") == "high" and decision.get("confidence", 0) < 0.85:
return True
# 3. Critical keyword present, skip the model entirely
lower = ticket.lower()
if any(kw in lower for kw in HIGH_PRIORITY_KEYWORDS):
return True
return False This logic runs independently of model accuracy: a critical ticket is never handed off to AI. In production this is the most important safety net.
4. Weekly drift monitoring
Model quality degrades over time (new product features, language shifts, provider-side updates). Catch accuracy drops early with weekly auto-eval:
# weekly_drift.py — re-run the eval set in production weekly
import csv, statistics
from router import classify
def weekly_drift_check(eval_csv: str, baseline: dict) -> dict:
correct = 0
confidences = []
for row in csv.DictReader(open(eval_csv)):
out = classify("gpt-4o-mini", row["ticket_text"])
correct += out["category"] == row["expected_category"]
confidences.append(out["confidence"])
accuracy = correct / 50
avg_conf = statistics.mean(confidences)
drift = {
"accuracy_drop": baseline["accuracy"] - accuracy,
"confidence_drop": baseline["avg_confidence"] - avg_conf,
}
if drift["accuracy_drop"] > 0.03: # 3% drop -> alert
send_slack_alert(f"Routing drift: accuracy {accuracy:.1%}")
return drift Run weekly via cron. If accuracy drops more than 3%, send a Slack alert; if it drops more than 5%, trigger automatic fallback (temporary switch to claude-opus-4-7).
Pilot timeline
Week 1: 5% traffic
Push the new router to production but only 5% of traffic goes through it. The old rule-based system runs in parallel as a control. Watch: accuracy, escalation override count, false-positive billing routes.
Weeks 2-3: 50% traffic
If week 1 was clean, expand to 50%. Drift monitor runs weekly. Watch the cost dashboard for the gpt-4o-mini vs claude-opus-4-7 split (target: 75/10).
Month 1: 100% traffic
Full migration. Old rule-based system is shut down. Drift monitor + escalation override summary lands as a weekly report for the support manager.