Auto-routing · Hands-on

Auto-classifying customer support tickets with AI and routing them to the right team

Have AI read each incoming support ticket, figure out what it's about (billing, technical, sales, cancellation etc.), set its priority and send it to the right team. In the POC playbook a model emerged as a ship candidate; here we make it production-ready: hybrid flow (simple rule + cheap model + strong-model fallback), critical-ticket escalation and weekly drift monitoring. All copy-paste Python.

Time10 min read · 2 hours to apply
RoleBackend, support ops
OutputProduction routing API
Production routing is four-layered: (1) simple rules skip the model on ~15% of traffic, (2) cheap model (gpt-4o-mini) handles ~75%, (3) strong fallback (claude-opus-4-7) catches the ~10% hard cases, (4) a rule forces critical tickets to a human. Cost goes down, quality goes up.

1. Production prompt + JSON schema

The loose POC prompt won't do. In production the model must produce required fields, an enum category and zero error margin. JSON schema validation catches malformed output and routes it to a human.

routing_prompt.txt
# routing_prompt.txt — production prompt
Classify the following support ticket.
Return valid JSON only, no extra text.

Categories:
- billing (payment, invoice, refund)
- technical (integration, error, API)
- sales (pre-sale, pricing)
- bug (software defect)
- cancellation (account close, subscription stop)
- other

Ticket:
{{ticket_text}}

JSON schema:
{
  "category": "billing|technical|sales|bug|cancellation|other",
  "priority": "low|medium|high",
  "team": "billing_support|tech_support|sales|core_eng|account_ops|triage",
  "confidence": 0.0,
  "escalation_required": true,
  "rationale": "short reason, max 80 chars"
}

The Parel API supports response_format: json_object; it forces the model away from non-JSON output (works on gpt-4o-mini, qwen3-max and claude-opus-4-7).

2. Hybrid routing: rule + two models

Sending all traffic to one model is both slow and expensive. A three-layer router:

LayerTriggerModelApprox % traffic
1. RuleHotwords like "refund", "chargeback"15%
2. CheapDefault flowgpt-4o-mini75%
3. Fallbackconfidence < 0.75claude-opus-4-710%
router.py
# router.py — hybrid routing: rule + cheap model + strong fallback
import json
from openai import OpenAI

client = OpenAI(api_key=PAREL_API_KEY, base_url="https://api.parel.cloud/v1")

CONFIDENCE_THRESHOLD = 0.75
PROMPT = open("routing_prompt.txt").read()

def classify(model, ticket):
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "Return valid JSON only."},
            {"role": "user", "content": PROMPT.replace("{{ticket_text}}", ticket)},
        ],
        temperature=0,
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

def route(ticket: str) -> dict:
    # Layer 1: rule-based hotwords (fastest, free)
    lower = ticket.lower()
    if any(w in lower for w in ["refund", "chargeback", "money back"]):
        return {"category": "billing", "team": "billing_support", "via": "rule"}

    # Layer 2: cheap model (gpt-4o-mini) — handles ~85% of cases
    decision = classify("gpt-4o-mini", ticket)
    if decision["confidence"] >= CONFIDENCE_THRESHOLD:
        decision["via"] = "gpt-4o-mini"
        return decision

    # Layer 3: strong model fallback — only hard cases
    decision = classify("claude-opus-4-7", ticket)
    decision["via"] = "claude-opus-4-7"
    return decision

Result: average cost drops (the 10× expensive model only sees 10% of traffic), accuracy rises (hard cases handled by the strong model), p95 latency stays stable (cheap model handles 75%).

3. Escalation guardrail

Some tickets must never be left to the model. Three rules force them straight to a human:

escalation.py
# escalation.py — force critical tickets to a human, not the model
HIGH_PRIORITY_KEYWORDS = ["urgent", "asap", "production down", "data loss", "gdpr", "kvkk"]

def needs_human_override(ticket: str, decision: dict) -> bool:
    # 1. Model itself flagged it
    if decision.get("escalation_required"):
        return True

    # 2. High priority and low confidence
    if decision.get("priority") == "high" and decision.get("confidence", 0) < 0.85:
        return True

    # 3. Critical keyword present, skip the model entirely
    lower = ticket.lower()
    if any(kw in lower for kw in HIGH_PRIORITY_KEYWORDS):
        return True

    return False

This logic runs independently of model accuracy: a critical ticket is never handed off to AI. In production this is the most important safety net.

4. Weekly drift monitoring

Model quality degrades over time (new product features, language shifts, provider-side updates). Catch accuracy drops early with weekly auto-eval:

weekly_drift.py
# weekly_drift.py — re-run the eval set in production weekly
import csv, statistics
from router import classify

def weekly_drift_check(eval_csv: str, baseline: dict) -> dict:
    correct = 0
    confidences = []
    for row in csv.DictReader(open(eval_csv)):
        out = classify("gpt-4o-mini", row["ticket_text"])
        correct += out["category"] == row["expected_category"]
        confidences.append(out["confidence"])

    accuracy = correct / 50
    avg_conf = statistics.mean(confidences)

    drift = {
        "accuracy_drop": baseline["accuracy"] - accuracy,
        "confidence_drop": baseline["avg_confidence"] - avg_conf,
    }

    if drift["accuracy_drop"] > 0.03:  # 3% drop -> alert
        send_slack_alert(f"Routing drift: accuracy {accuracy:.1%}")

    return drift

Run weekly via cron. If accuracy drops more than 3%, send a Slack alert; if it drops more than 5%, trigger automatic fallback (temporary switch to claude-opus-4-7).

Pilot timeline

Week 1: 5% traffic

Push the new router to production but only 5% of traffic goes through it. The old rule-based system runs in parallel as a control. Watch: accuracy, escalation override count, false-positive billing routes.

Weeks 2-3: 50% traffic

If week 1 was clean, expand to 50%. Drift monitor runs weekly. Watch the cost dashboard for the gpt-4o-mini vs claude-opus-4-7 split (target: 75/10).

Month 1: 100% traffic

Full migration. Old rule-based system is shut down. Drift monitor + escalation override summary lands as a weekly report for the support manager.