Skip to main content
POST
/
agent-eval
/
cases
/
{case_id}
/
run
Run a single case
curl --request POST \
  --url https://api.goyappr.com/agent-eval/cases/{case_id}/run \
  --header 'Content-Type: application/json' \
  --data '
{
  "agent_overrides": {}
}
'
{
  "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "company_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "mode": "text",
  "created_at": "2023-11-07T05:31:56Z",
  "case_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "case": {
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "company_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "agent_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "persona_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "name": "Yes path — caller agrees on first ask",
    "scenario": "The persona is responding to a missed call from your business about their recent inquiry. They have time to talk for 5 minutes.",
    "success_criteria": [
      {
        "weight": 1,
        "description": "<string>",
        "pattern": "<string>",
        "tool_name": "<string>",
        "args_match": {},
        "node_id": "<string>",
        "rubric": "<string>"
      }
    ],
    "max_turns": 20,
    "pass_threshold": 80,
    "tool_policy": "mock",
    "created_at": "2023-11-07T05:31:56Z",
    "agent": {
      "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
      "name": "<string>",
      "type": "prompt",
      "flow_config": {
        "nodes": [
          {
            "id": "<string>",
            "name": "<string>",
            "position": {
              "x": 123,
              "y": 123
            },
            "agent_speaks_first": true,
            "greeting": "<string>",
            "is_literal": false,
            "next_step_id": "<string>",
            "auto_advance": true
          }
        ],
        "flow_config_version": "1",
        "metadata": {
          "custom_metadata_keys": [
            "<string>"
          ]
        }
      },
      "system_prompt": "<string>",
      "description": "<string>",
      "background_sound_volume": 0.3,
      "temperature": 1,
      "greeting_message": "<string>",
      "agent_speaks_first": true,
      "vad_stop_secs": 0.5,
      "vad_start_secs": 0.2,
      "vad_confidence": 0.7,
      "silence_timeout_secs": 60,
      "max_continuous_speech_secs": 120,
      "max_call_duration_secs": 600,
      "lead_memory_enabled": true,
      "is_active": true,
      "webhook_url": "<string>",
      "webhook_events": [],
      "extraction_parameters": [
        {
          "name": "customerName",
          "description": "The caller's full name as mentioned during the conversation"
        }
      ],
      "created_at": "2023-11-07T05:31:56Z",
      "updated_at": "2023-11-07T05:31:56Z"
    },
    "persona": {
      "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
      "company_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
      "name": "Frustrated tenant",
      "identity_prompt": "You are a 38-year-old tenant calling about a leaking pipe in your kitchen. You're frustrated because this is the third time you've reported it.",
      "language": "en",
      "created_at": "2023-11-07T05:31:56Z",
      "description": "<string>",
      "behavior_traits": {
        "patience": "low",
        "verbosity": "chatty",
        "cooperation": "cooperative",
        "interruption_tendency": "occasional",
        "goal": "Get a maintenance technician scheduled today"
      },
      "voice_config": {},
      "updated_at": "2023-11-07T05:31:56Z",
      "deleted_at": "2023-11-07T05:31:56Z"
    },
    "suite_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "description": "<string>",
    "agent_overrides": {},
    "tool_allowlist": [],
    "updated_at": "2023-11-07T05:31:56Z",
    "deleted_at": "2023-11-07T05:31:56Z"
  },
  "suite_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "suite_run_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "agent_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "persona_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "agent_model": "<string>",
  "persona_model": "<string>",
  "started_at": "2023-11-07T05:31:56Z",
  "ended_at": "2023-11-07T05:31:56Z",
  "duration_ms": 123,
  "score": 50,
  "pass_fail": true,
  "evaluation": {
    "score": 50,
    "pass_fail": true,
    "results": [
      {
        "assertion": {
          "weight": 1,
          "description": "<string>",
          "pattern": "<string>",
          "tool_name": "<string>",
          "args_match": {},
          "node_id": "<string>",
          "rubric": "<string>"
        },
        "passed": true,
        "weight": 123,
        "reason": "<string>"
      }
    ]
  },
  "agent_input_tokens": 0,
  "agent_output_tokens": 0,
  "persona_input_tokens": 0,
  "persona_output_tokens": 0,
  "agent_cost_cents": 0,
  "persona_cost_cents": 0,
  "total_cost_cents": 0,
  "error": "<string>",
  "agent_overrides": {}
}

Documentation Index

Fetch the complete documentation index at: https://docs.goyappr.com/llms.txt

Use this file to discover all available pages before exploring further.

Trigger one execution of the case. Blocks up to 60 seconds waiting for the run to fully finish (worker assertions + billing complete) so simple callers don’t have to poll for short runs.

Response semantics

  • 200 — run finished within the 60-second window. Body is the full EvalRun with case (and nested agent + persona) expanded; score, pass_fail, evaluation, and the cost columns are final.
  • 202 — still running after 60s. Body is the latest non-terminal EvalRun. Keep polling GET /agent-eval/runs/{id} until both status is terminal AND queue_status === "done".

Per-run overrides

agent_overrides in the request body is the cleanest way to test “what if I changed the system_prompt to this” without committing the change to the case. The override is layered over the case’s own agent_overrides and recorded on the run row, so you can always trace back which config produced which result.

No webhooks

Eval runs do not emit webhooks. Polling is the only mechanism — for batches use the suite-run aggregate (GET /agent-eval/suites/{id}/runs/{suite_run_id}).

Path Parameters

case_id
string<uuid>
required

Body

application/json
agent_overrides
object

Optional per-run overrides applied on top of the case's own overrides.

Response

Run finished within 60s; fully scored.

One execution of a case. Append-only after creation except for status/billing/lifecycle fields.

id
string<uuid>
required
company_id
string<uuid>
required
status
enum<string>
required

User-facing run status. Flips to completed/failed as soon as the conversation ends — BEFORE the worker scores assertions and bills. Do not treat score / pass_fail / total_cost_cents as final until queue_status === "done".

Available options:
queued,
running,
completed,
failed,
cancelled
mode
enum<string>
default:text
required

Always text in v1. voice is reserved for the future loopback mode.

Available options:
text,
voice
created_at
string<date-time>
required
case_id
string<uuid> | null

FK; nullable because cases can be deleted while runs are kept as history.

case
object

A specific eval scenario — persona + target agent + scenario + success criteria.

suite_id
string<uuid> | null
suite_run_id
string<uuid> | null

Groups runs spawned by a single POST /agent-eval/suites/{id}/run call. Pass to GET /agent-eval/suites/{id}/runs/{suite_run_id} for the aggregate.

queue_status
enum<string>

Internal worker pipeline state. pending = waiting in queue, claimed = the cron worker has dispatched it to pipecat, done = the worker has finished scoring + billing. Always poll for queue_status === "done" before reading the scoring / billing fields. The transient window between status === "completed" and queue_status === "done" is typically < 5 seconds but can be longer under contention.

Available options:
pending,
claimed,
done
agent_id
string<uuid> | null

Denormalized snapshot — same as case.agent_id at the moment the run was created.

persona_id
string<uuid> | null
agent_model
string | null

Model identifier the agent ran on. Surfaced for cost auditing.

persona_model
string | null
started_at
string<date-time> | null
ended_at
string<date-time> | null
duration_ms
integer | null
score
number | null
Required range: 0 <= x <= 100
pass_fail
boolean | null
termination_reason
enum<string> | null

Why the run stopped.

Available options:
persona_goodbye,
agent_ended,
max_turns,
timeout,
error,
cancelled
evaluation
object

Populated when status is completed or failed.

agent_input_tokens
integer
default:0
agent_output_tokens
integer
default:0
persona_input_tokens
integer
default:0
persona_output_tokens
integer
default:0
agent_cost_cents
integer
default:0
persona_cost_cents
integer
default:0
total_cost_cents
integer
default:0

Total amount debited from the company's credit balance for this run.

error
string | null

Populated when status='failed'.

agent_overrides
object

Snapshot of the case's agent_overrides plus any per-run overrides supplied at create time.