Trigger one execution of the case. Blocks up to 60 seconds waiting for the run to fully finish (worker assertions + billing complete) so simple callers don’t have to poll for short runs.
Required scope agent_eval:run.
Response codes:
200 — run finished within the 60s window. Body is the full EvalRun
with case (and nested agent + persona) expanded; score, pass_fail,
evaluation, and the cost columns are final.202 — still running after 60s. Body is the latest non-terminal EvalRun.
Keep polling GET /agent-eval/runs/{id} until both status is terminal AND
queue_status === "done".Trigger one execution of the case. Blocks up to 60 seconds waiting for the run to fully finish (worker assertions + billing complete) so simple callers don’t have to poll for short runs.Documentation Index
Fetch the complete documentation index at: https://docs.goyappr.com/llms.txt
Use this file to discover all available pages before exploring further.
200 — run finished within the 60-second window. Body is the full EvalRun with case (and nested agent + persona) expanded; score, pass_fail, evaluation, and the cost columns are final.202 — still running after 60s. Body is the latest non-terminal EvalRun. Keep polling GET /agent-eval/runs/{id} until both status is terminal AND queue_status === "done".agent_overrides in the request body is the cleanest way to test “what if I changed the system_prompt to this” without committing the change to the case. The override is layered over the case’s own agent_overrides and recorded on the run row, so you can always trace back which config produced which result.
GET /agent-eval/suites/{id}/runs/{suite_run_id}).Optional per-run overrides applied on top of the case's own overrides.
Run finished within 60s; fully scored.
One execution of a case. Append-only after creation except for status/billing/lifecycle fields.
User-facing run status. Flips to completed/failed as soon as the conversation ends — BEFORE the worker scores assertions and bills. Do not treat score / pass_fail / total_cost_cents as final until queue_status === "done".
queued, running, completed, failed, cancelled Always text in v1. voice is reserved for the future loopback mode.
text, voice FK; nullable because cases can be deleted while runs are kept as history.
A specific eval scenario — persona + target agent + scenario + success criteria.
Groups runs spawned by a single POST /agent-eval/suites/{id}/run call.
Pass to GET /agent-eval/suites/{id}/runs/{suite_run_id} for the aggregate.
Internal worker pipeline state. pending = waiting in queue, claimed = the cron worker has dispatched it to pipecat, done = the worker has finished scoring + billing. Always poll for queue_status === "done" before reading the scoring / billing fields. The transient window between status === "completed" and queue_status === "done" is typically < 5 seconds but can be longer under contention.
pending, claimed, done Denormalized snapshot — same as case.agent_id at the moment the run was created.
Model identifier the agent ran on. Surfaced for cost auditing.
0 <= x <= 100Why the run stopped.
persona_goodbye, agent_ended, max_turns, timeout, error, cancelled Populated when status is completed or failed.
Total amount debited from the company's credit balance for this run.
Populated when status='failed'.
Snapshot of the case's agent_overrides plus any per-run overrides supplied at create time.