Task Components
| Component | Purpose |
|---|---|
ENABLED_TOOLS | Tools available to the model (constraint space) |
PROMPT | User’s query |
TRAJECTORY | Intended sequence of tool calls (ground truth) |
GTFA_CLAIMS | Required factual claims in final response |
Component Details
ENABLED_TOOLS
Defines which tools the agent can access. Constraining tools increases task focus and reduces ambiguity.- Include only tools relevant to the task
- Adding irrelevant tools tests discrimination ability
PROMPT
The natural language instruction given to the agent.- Be specific about expected outcomes
- Include implicit constraints (e.g., “during business hours”)
TRAJECTORY
The expected sequence of tool calls (ground truth). Used for trajectory-based evaluation.GTFA_CLAIMS
Ground Truth Factual Assertions that must be present in the agent’s final response.Example Task Structure
Complete Example
Best Practices
Define clear success criteria
Define clear success criteria
GTFA claims should be specific and unambiguous:❌ “Agent completed the task correctly”✅ “Agent sent email to john@example.com with subject containing ‘Q3 Report’”
Include boundary cases
Include boundary cases
Test agent behavior at the edges:
- What if no results are found?
- What if multiple matches exist?
- What if required information is missing?
Balance tool availability
Balance tool availability
- Too few tools: Task may be impossible
- Too many tools: Agent may get confused
- Include 1-2 “distractor” tools to test discrimination
Design for multiple valid paths
Design for multiple valid paths
Many tasks can be completed different ways. Ensure GTFA claims verify outcomes, not specific tool sequences.
Test with realistic data
Test with realistic data
Use data packs that reflect real usage patterns—realistic contact names, plausible email content, believable calendar schedules.