/api/verifier HTTP endpoint.
Access
| Method | Endpoint |
|---|---|
| Docker | POST /api/verifier |
| Web UI | ”Run Verifier” button |
Check Types
| Type | Purpose | Data Source |
|---|---|---|
| State Check | Verify database changes | SQLite database |
| Log Check | Verify user interactions | Session logs |
| Rubric Check | LLM-based evaluation | Agent response |
State Checks
Verify that the database contains expected values after agent actions.Components
| Field | Description |
|---|---|
table | Which database table to query |
conditions | Filters to match records |
assertions | Expected values to verify |
Example Use Cases
- “events table has entry with title ‘Meeting’” (Calendr)
- “users table has email updated to ‘new@email.com’” (Cloudfile)
- “cart_items table is empty after checkout” (Shopora)
- “orders table has new record with status ‘completed’” (Shopora)
- “messages table has email with subject ‘Re: Project Update’” (Pandora’s Inbox)
Best Practices
- Target specific tables and fields
- Use precise conditions to isolate records
- Include both existence and value assertions
- Consider order-independent matching for arrays
Log Checks
Verify that specific user interactions occurred during the task.Components
| Field | Description |
|---|---|
event_type | Type of interaction (click, input, navigation) |
element_id | Target element identifier |
value | Expected value (for input events) |
Example Use Cases
- “User clicked the ‘Submit’ button”
- “User navigated to /settings”
- “User entered ‘John’ in name field”
- “User selected ‘California’ from dropdown”
- “User opened email thread with subject ‘Q4 Budget Allocation’” (Pandora’s Inbox)
What Gets Logged
Website environments automatically capture:- Button clicks with element IDs
- Form inputs with values
- Page navigations
- Dropdown selections
- Checkbox/radio changes
Rubric Checks
LLM-evaluated criteria for qualitative assessment of agent responses.Components
| Field | Description |
|---|---|
criteria | Description of what to evaluate |
rubric | Scoring guidelines |
Behavior
- Agent completes task and produces response
- Rubric check returns “PENDING” status
- External LLM evaluates response against criteria
- Final pass/fail determined
Example Use Cases
- “Response accurately summarizes the calendar events” (Calendr)
- “Agent provided helpful and accurate information”
- “Output follows the requested format”
- “Response correctly identifies the date of the calendar invitation in the email” (Pandora’s Inbox)
Response Structure
Debugging Failed Checks
When checks fail, the response includes:| Field | Description |
|---|---|
all_results | All data that was searched |
expected | What was expected |
actual | What was found |