Quick POC · Hands-on

How to run an AI POC with Parel: a 1-day hands-on guide

This page is hands-on, not conceptual. With your Parel API key, the ready CSV below and the Python script you'll copy-paste, you'll have a working POC that compares three models side by side in about an hour. By the end you'll hold a decision table you can present to your manager.

Time8 min read · 1 hour to apply
RoleBackend, ML eng., PM
OutputPOC scorecard + Python code
Sample task in this guide: classify 50 support tickets into 6 categories. Using Parel's single API, we test 3 models (gpt-4o-mini, qwen3-max, claude-opus-4-7) with the same code, and in one hour produce clear answers to "is this task fit for AI, which model is enough, what does it cost".

To adapt this to your own use case (summarization, extraction, FAQ, QA), only the prompt and the test set change. The steps stay the same.

Step 0: What do you need?

Everything required for the POC:

  • Parel account + API key. Generate one at app.parel.cloud/api-keys (format parel_pk_...). New users get a $1 promo credit on signup.
  • $3 prepaid is enough. 50 tickets × 3 models = 150 requests totalling about $0.30. The Parel Compare UI adds ~$0.05 if you use it.
  • Python 3.10+ and the openai package. Parel is OpenAI SDK compatible; no extra library needed.
  • A 50-example test set. An example CSV is in this guide; or build one from your own data (5-10 minutes of work).
setup (3 minutes)
# Python 3.10+ and the openai package are enough
pip install openai

# Set your Parel API key as an env var
export PAREL_API_KEY="parel_pk_xxxxxxxxxxxx"

Step 1: Prepare the test set

The test set is the foundation of the POC. The CSV below is an example: either expand these 6 rows to 50 (sampling real tickets from your system) or rebuild a 50-row CSV in the same format for your own task. Balance three difficulty levels: easy (category is obvious), medium (close to two categories) and hard (sarcastic tone, missing info, mixed topics).

support_50.csv (sample rows)
ticket_text,expected_category,priority
"I was charged twice for the same order.",billing,high
"My API key returns 401 in production.",technical,high
"How does invoicing work if we upgrade to Pro?",sales,medium
"I want to close my account, what is the process?",cancellation,low
"My webhooks are randomly returning 502.",bug,high
"Thanks team, you've been very helpful.",other,low
Practical tip: Sample 50 random tickets from your production traffic. Then label them yourself (the expected_category column). That's a 30-minute job, but it's what makes the POC trustworthy.

Step 2: Smoke test (verify connectivity)

Before running the full runner, verify your connection to Parel with a single request. This step takes 30 seconds and confirms your API key works and you have credit.

curl smoke test
# Single request to verify the connection
curl https://api.parel.cloud/v1/chat/completions \
  -H "Authorization: Bearer $PAREL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-max",
    "messages": [
      {"role": "user", "content": "One sentence: which model are you?"}
    ],
    "max_tokens": 64
  }'

# Expected: 200 + JSON (choices[0].message.content)
# 401 = API key empty or wrong. 404 = wrong model name.

If you get 200 with a model name and an English sentence in the response, you're ready. 401 means $PAREL_API_KEY is empty or wrong; 404 means a wrong model name (e.g. qwen3.max instead of qwen3-max).

Step 3: The Python runner that tests three models in parallel

Save the file below as run_eval.py, place the support_50.csv from step 1 in the same folder, and run it. The script sends 50 tickets to 3 models sequentially and measures accuracy, p95 latency and token usage for each. Notice we test models from three different providers with a single API key: this is Parel's most practical advantage.

run_eval.py
# run_eval.py — run 50 tickets through 3 models side by side
import csv, json, time, statistics
from openai import OpenAI

client = OpenAI(
    api_key="${PAREL_API_KEY}",
    base_url="https://api.parel.cloud/v1",
)

# 3 models with different characters
MODELS = [
    "gpt-4o-mini",      # cheap + fast
    "qwen3-max",        # open-source reference
    "claude-opus-4-7",  # strong reasoning
]

PROMPT = """Classify the following support ticket into exactly one of:
billing, technical, sales, bug, cancellation, other

Return JSON only:
{"category": "...", "confidence": 0.0}

Ticket:
"""

def classify(model, ticket):
    started = time.time()
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "Return valid JSON only."},
            {"role": "user", "content": PROMPT + ticket},
        ],
        temperature=0,
    )
    latency_ms = int((time.time() - started) * 1000)
    out = json.loads(response.choices[0].message.content)
    usage = response.usage
    return out, latency_ms, usage

results = {}
for model in MODELS:
    correct, latencies, total_tokens = 0, [], 0
    for row in csv.DictReader(open("support_50.csv")):
        try:
            out, latency, usage = classify(model, row["ticket_text"])
            correct += out["category"] == row["expected_category"]
            latencies.append(latency)
            total_tokens += usage.total_tokens
        except Exception as e:
            print(f"{model} failed on row: {e}")
    results[model] = {
        "accuracy": correct / 50,
        "p95_ms": int(statistics.quantiles(latencies, n=20)[-1]),
        "total_tokens": total_tokens,
    }

print(json.dumps(results, indent=2))

Run it:

  • python run_eval.py
  • 50 × 3 = 150 requests, takes 3-5 minutes
  • Output: accuracy, p95_ms, total_tokens per model

During the run, Parel routes each request to the right provider (OpenAI, DashScope, Anthropic) automatically. Even though we use the OpenAI SDK in the code, we're calling Qwen and Claude as well — switching providers didn't require a single code change.

Step 4: Read the results

The output looks roughly like the table below (your numbers will differ; the shape is the same):

ModelAccuracyp95 latencyEstimated $/1K
gpt-4o-mini88%720 ms$0.18
qwen3-max92%1.4 s$0.42
claude-opus-4-794%2.1 s$1.85

Compare accuracy against cost. In this example, the most expensive model (claude-opus-4-7) has the highest accuracy but is 10× more expensive than gpt-4o-mini for only +6 points. Financially, gpt-4o-mini is likely your "ship" candidate.

Compute cost with one formula: cost = total_tokens × $/1K_token. If 50 tickets consume ~8K tokens on average, that's about $0.04 with gpt-4o-mini; for 100K tickets per month, ~$80.

Step 5: Decision — ship / iterate / stop

You now have a table you can present to a manager. The decision isn't based on a single metric; it's the combination of quality, cost, latency and error impact:

Ship

One model passes the quality threshold, latency fits the budget, cost is understood. Start a 5-10% pilot, keep the old rule-based system as a control group. Re-run the eval set weekly in production to detect drift.

Iterate

Accuracy is just below the threshold. Try in order: add few-shot examples to the prompt, tighten the output schema (required fields), expand the test set (especially hard cases). Switching models is the last step; it's rarely needed.

Stop

No model passes the threshold, the error impact is too high (a wrong decision is hard to reverse) or business value is unclear. Splitting the use case or returning to a rule-based solution is also a valuable POC outcome. Knowing where AI doesn't fit is a win.

Bonus: same POC, no code

If you'd rather not install Python, the Parel Compare UI does the same job. Upload the CSV, pick the models, click Run, see the same table after 2-3 minutes. The code version is more reproducible; the UI version is faster for PMs and non-devs.

Parel Compare UI
# Or, no code: Parel Compare UI
# https://app.parel.cloud/compare
#
# 1. Click "New run"
# 2. Upload your CSV (input + expected columns)
# 3. Pick 3 models (gemini-3-flash, qwen3-max, gpt-5.4)
# 4. "Run" → ~2-3 minutes later: quality + latency + cost table

What's next

POC done. The next playbooks take you a step further: