Skip to main content
POST
/
agent-eval
/
cases
Create case
curl --request POST \
  --url https://api.goyappr.com/agent-eval/cases \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "name": "Yes path — agreement on first ask",
  "agent_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "persona_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "scenario": "<string>",
  "description": "<string>",
  "suite_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "success_criteria": [],
  "max_turns": 20,
  "pass_threshold": 80,
  "agent_overrides": {},
  "tool_policy": "mock",
  "tool_allowlist": []
}
'
{
  "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "company_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "agent_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "persona_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "name": "Yes path — caller agrees on first ask",
  "scenario": "The persona is responding to a missed call from your business about their recent inquiry. They have time to talk for 5 minutes.",
  "success_criteria": [
    {
      "weight": 1,
      "description": "<string>",
      "pattern": "<string>",
      "tool_name": "<string>",
      "args_match": {},
      "node_id": "<string>",
      "rubric": "<string>"
    }
  ],
  "max_turns": 20,
  "pass_threshold": 80,
  "tool_policy": "mock",
  "created_at": "2023-11-07T05:31:56Z",
  "agent": {
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "name": "<string>",
    "type": "prompt",
    "flow_config": {
      "nodes": [
        {
          "id": "<string>",
          "name": "<string>",
          "position": {
            "x": 123,
            "y": 123
          },
          "agent_speaks_first": true,
          "greeting": "<string>",
          "is_literal": false,
          "next_step_id": "<string>",
          "auto_advance": true
        }
      ],
      "flow_config_version": "1",
      "metadata": {
        "custom_metadata_keys": [
          "<string>"
        ]
      }
    },
    "system_prompt": "<string>",
    "description": "<string>",
    "background_sound_volume": 0.3,
    "temperature": 1,
    "greeting_message": "<string>",
    "agent_speaks_first": true,
    "vad_stop_secs": 0.5,
    "vad_start_secs": 0.2,
    "vad_confidence": 0.7,
    "silence_timeout_secs": 60,
    "max_continuous_speech_secs": 120,
    "max_call_duration_secs": 600,
    "lead_memory_enabled": true,
    "is_active": true,
    "webhook_url": "<string>",
    "webhook_events": [],
    "extraction_parameters": [
      {
        "name": "customerName",
        "description": "The caller's full name as mentioned during the conversation"
      }
    ],
    "created_at": "2023-11-07T05:31:56Z",
    "updated_at": "2023-11-07T05:31:56Z"
  },
  "persona": {
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "company_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "name": "Frustrated tenant",
    "identity_prompt": "You are a 38-year-old tenant calling about a leaking pipe in your kitchen. You're frustrated because this is the third time you've reported it.",
    "language": "en",
    "created_at": "2023-11-07T05:31:56Z",
    "description": "<string>",
    "behavior_traits": {
      "patience": "low",
      "verbosity": "chatty",
      "cooperation": "cooperative",
      "interruption_tendency": "occasional",
      "goal": "Get a maintenance technician scheduled today"
    },
    "voice_config": {},
    "updated_at": "2023-11-07T05:31:56Z",
    "deleted_at": "2023-11-07T05:31:56Z"
  },
  "suite_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "description": "<string>",
  "agent_overrides": {},
  "tool_allowlist": [],
  "updated_at": "2023-11-07T05:31:56Z",
  "deleted_at": "2023-11-07T05:31:56Z"
}

Documentation Index

Fetch the complete documentation index at: https://docs.goyappr.com/llms.txt

Use this file to discover all available pages before exploring further.

A case is the runnable unit of agent eval — persona + agent + scenario + success criteria. Suites are optional; a case with suite_id: null is an ad-hoc case you can run on its own.

Worked example

{
  "name": "Yes path — caller agrees on first ask",
  "agent_id": "...",
  "persona_id": "...",
  "scenario": "The persona is responding to a missed call from your business about their recent inquiry. They have 5 minutes to talk.",
  "success_criteria": [
    {
      "type": "must_say",
      "pattern": "would Tuesday at 3pm work for you",
      "weight": 1
    },
    {
      "type": "must_call_tool",
      "tool_name": "bookAppointment",
      "weight": 2
    },
    {
      "type": "must_not_say",
      "pattern": "guarantee",
      "weight": 1
    }
  ],
  "max_turns": 20,
  "pass_threshold": 80,
  "tool_policy": "mock"
}

Assertion shape

Each entry in success_criteria is one assertion with these fields:
FieldRequired forNotes
typeallOne of must_say, must_not_say, must_call_tool, must_reach_node, custom_llm_judge.
weightoptionalRelative weight in the case score (defaults to 1).
patternmust_say / must_not_sayRegex evaluated against the full agent transcript, case-insensitive. Plain substrings work — they’re valid regex. Escape literal punctuation (e.g. goodbye\\?).
tool_namemust_call_toolThe tool name the agent must invoke at least once during the run.
args_matchmust_call_tool (optional)Object whose keys must equal the values in the tool’s invocation args. Use "$present" to assert “key exists with any non-null value”.
node_idmust_reach_nodeFlow node id the run must enter. Silently fails for prompt-mode agents — use this only on flow agents.
rubriccustom_llm_judgeNatural-language description of what a successful run looks like. Engine stub in v1 — returns passed=false, reason='not_implemented' until v1.1 ships full LLM-as-judge.
descriptionoptionalHuman-readable note shown in the dashboard alongside this assertion.

Choosing tool_policy

PolicyWhen to use
mock (default)Your CI suite. Tools never fire — every call returns a synthetic success result. Free, deterministic.
realOne-off pre-prod check that the full integration works. Hits real systems; charges real third-party costs.
allowlistHybrid — list specific tools in tool_allowlist. The named tools fire for real, the rest mock. Useful when you want to validate one new tool but not regenerate calendar holds for the whole suite.

Authorizations

Authorization
string
header
required

Your Yappr API key (e.g. ypr_live_...). Generate one in the dashboard under Settings → API Keys.

Body

application/json
name
string
required
Example:

"Yes path — agreement on first ask"

agent_id
string<uuid>
required
persona_id
string<uuid>
required
scenario
string
required
description
string
suite_id
string<uuid> | null
success_criteria
object[]
max_turns
integer
default:20
Required range: 1 <= x <= 100
pass_threshold
number
default:80
Required range: 0 <= x <= 100
agent_overrides
object
tool_policy
enum<string>
default:mock
Available options:
mock,
real,
allowlist
tool_allowlist
string[]

Response

Case created

A specific eval scenario — persona + target agent + scenario + success criteria.

id
string<uuid>
required
company_id
string<uuid>
required
agent_id
string<uuid>
required

Agent under test. Full agent record is expanded inline as agent in API responses.

persona_id
string<uuid>
required
name
string
required
Example:

"Yes path — caller agrees on first ask"

scenario
string
required

Free-form one-paragraph framing the persona LLM is given on top of its identity. Describe the situation that prompted the call.

Example:

"The persona is responding to a missed call from your business about their recent inquiry. They have time to talk for 5 minutes."

success_criteria
object[]
required

Array of assertions evaluated after the run completes.

max_turns
integer
default:20
required

Hard cap on conversation turns. Hitting this terminates the run with termination_reason='max_turns'.

Required range: 1 <= x <= 100
pass_threshold
number
default:80
required

Weighted-score threshold (0-100) for pass_fail=true.

Required range: 0 <= x <= 100
tool_policy
enum<string>
default:mock
required

How the agent's tools behave during the run. mock (default): every tool call returns a synthetic success result the worker fabricates from the tool's declared output schema. real: tools fire for real (charges real money, hits real systems). allowlist: tools whose name appears in tool_allowlist fire for real, the rest return mock results.

Available options:
mock,
real,
allowlist
created_at
string<date-time>
required
agent
object
persona
object

Reusable caller archetype consumed by eval cases. The identity_prompt plus behavior_traits shape how the persona LLM responds; the same persona can be reused across many cases.

suite_id
string<uuid> | null

Optional parent suite. When null, the case is ad-hoc — runnable on its own but not part of a regression sweep.

description
string | null
agent_overrides
object

Optional per-case overrides applied to the agent's saved config at run time (e.g. a different system_prompt or flow_config for A/B testing). Same shape as the agent record. The agent on disk is never mutated.

tool_allowlist
string[]

Used only when tool_policy='allowlist'. List of tool names (camelCase) that should fire for real.

updated_at
string<date-time>
deleted_at
string<date-time> | null