Required scope agent_eval:update. Send only the fields you want to change.
Send only the fields you want to change. The case’sDocumentation Index
Fetch the complete documentation index at: https://docs.goyappr.com/llms.txt
Use this file to discover all available pages before exploring further.
agent_id itself is not patchable — to repoint a case at a different agent, create a new case (the original keeps its run history intact).
A common pattern: tweak success_criteria after watching a few runs. Lower the weight on a flaky assertion, raise it on a load-bearing one, add a new must_not_say for a phrase you saw the agent leak.Your Yappr API key (e.g. ypr_live_...). Generate one in the dashboard under Settings → API Keys.
1 <= x <= 1000 <= x <= 100mock, real, allowlist Updated case
A specific eval scenario — persona + target agent + scenario + success criteria.
Agent under test. Full agent record is expanded inline as agent in API responses.
"Yes path — caller agrees on first ask"
Free-form one-paragraph framing the persona LLM is given on top of its identity. Describe the situation that prompted the call.
"The persona is responding to a missed call from your business about their recent inquiry. They have time to talk for 5 minutes."
Array of assertions evaluated after the run completes.
Hard cap on conversation turns. Hitting this terminates the run with termination_reason='max_turns'.
1 <= x <= 100Weighted-score threshold (0-100) for pass_fail=true.
0 <= x <= 100How the agent's tools behave during the run. mock (default): every tool call returns a synthetic success result the worker fabricates from the tool's declared output schema. real: tools fire for real (charges real money, hits real systems). allowlist: tools whose name appears in tool_allowlist fire for real, the rest return mock results.
mock, real, allowlist Reusable caller archetype consumed by eval cases. The identity_prompt plus behavior_traits shape how the persona LLM responds; the same persona can be reused across many cases.
Optional parent suite. When null, the case is ad-hoc — runnable on its own but not part of a regression sweep.
Optional per-case overrides applied to the agent's saved config at run time (e.g. a different system_prompt or flow_config for A/B testing). Same shape as the agent record. The agent on disk is never mutated.
Used only when tool_policy='allowlist'. List of tool names (camelCase) that should fire for real.