Required scope agent_eval:create. The agent_id must reference a non-deleted agent in this company; persona_id must reference a non-deleted persona. Suites are optional — you can create stand-alone ad-hoc cases.
A case is the runnable unit of agent eval — persona + agent + scenario + success criteria. Suites are optional; a case withDocumentation Index
Fetch the complete documentation index at: https://docs.goyappr.com/llms.txt
Use this file to discover all available pages before exploring further.
suite_id: null is an ad-hoc case you can run on its own.
success_criteria is one assertion with these fields:
| Field | Required for | Notes |
|---|---|---|
type | all | One of must_say, must_not_say, must_call_tool, must_reach_node, custom_llm_judge. |
weight | optional | Relative weight in the case score (defaults to 1). |
pattern | must_say / must_not_say | Regex evaluated against the full agent transcript, case-insensitive. Plain substrings work — they’re valid regex. Escape literal punctuation (e.g. goodbye\\?). |
tool_name | must_call_tool | The tool name the agent must invoke at least once during the run. |
args_match | must_call_tool (optional) | Object whose keys must equal the values in the tool’s invocation args. Use "$present" to assert “key exists with any non-null value”. |
node_id | must_reach_node | Flow node id the run must enter. Silently fails for prompt-mode agents — use this only on flow agents. |
rubric | custom_llm_judge | Natural-language description of what a successful run looks like. Engine stub in v1 — returns passed=false, reason='not_implemented' until v1.1 ships full LLM-as-judge. |
description | optional | Human-readable note shown in the dashboard alongside this assertion. |
tool_policy| Policy | When to use |
|---|---|
mock (default) | Your CI suite. Tools never fire — every call returns a synthetic success result. Free, deterministic. |
real | One-off pre-prod check that the full integration works. Hits real systems; charges real third-party costs. |
allowlist | Hybrid — list specific tools in tool_allowlist. The named tools fire for real, the rest mock. Useful when you want to validate one new tool but not regenerate calendar holds for the whole suite. |
Your Yappr API key (e.g. ypr_live_...). Generate one in the dashboard under Settings → API Keys.
"Yes path — agreement on first ask"
1 <= x <= 1000 <= x <= 100mock, real, allowlist Case created
A specific eval scenario — persona + target agent + scenario + success criteria.
Agent under test. Full agent record is expanded inline as agent in API responses.
"Yes path — caller agrees on first ask"
Free-form one-paragraph framing the persona LLM is given on top of its identity. Describe the situation that prompted the call.
"The persona is responding to a missed call from your business about their recent inquiry. They have time to talk for 5 minutes."
Array of assertions evaluated after the run completes.
Hard cap on conversation turns. Hitting this terminates the run with termination_reason='max_turns'.
1 <= x <= 100Weighted-score threshold (0-100) for pass_fail=true.
0 <= x <= 100How the agent's tools behave during the run. mock (default): every tool call returns a synthetic success result the worker fabricates from the tool's declared output schema. real: tools fire for real (charges real money, hits real systems). allowlist: tools whose name appears in tool_allowlist fire for real, the rest return mock results.
mock, real, allowlist Reusable caller archetype consumed by eval cases. The identity_prompt plus behavior_traits shape how the persona LLM responds; the same persona can be reused across many cases.
Optional parent suite. When null, the case is ad-hoc — runnable on its own but not part of a regression sweep.
Optional per-case overrides applied to the agent's saved config at run time (e.g. a different system_prompt or flow_config for A/B testing). Same shape as the agent record. The agent on disk is never mutated.
Used only when tool_policy='allowlist'. List of tool names (camelCase) that should fire for real.