Explore live task examples at gym.scale.com.
Task Components
Every task is composed of core components:| Component | Purpose |
|---|---|
| Prompt | Natural language instructions for the agent |
| Initial State | Starting environment configuration (via data packs) |
| Available Tools | The tools/actions the agent can use |
| Verifier | The mechanism for measuring success |
Task Types
Tasks are categorized by the primary action required:| Type | Description | Example |
|---|---|---|
| Information Retrieval | Agent gathers and reports information without modifying state | ”What events are scheduled for tomorrow?” |
| State Modification | Agent performs actions that change the environment | ”Schedule a meeting with John for Friday at 2pm” |
| Hybrid | Combination of retrieval and modification | ”Find all overdue invoices and send reminder emails” |
Choose Your Environment
Website Tasks
Prompts, subproblems, and JSON verifiers for web apps
Desktop Tasks
Initialization configs and file-based evaluators for VMs
MCP Tasks
Tool constraints, trajectories, and GTFA claims
Best Practices
Write Clear Prompts
Write Clear Prompts
- Be specific about the desired outcome
- Include all necessary context
- Avoid ambiguous instructions
- Use natural language a human would understand
Design Measurable Outcomes
Design Measurable Outcomes
- Identify the exact state changes to verify
- Include both positive and negative checks
- Consider partial completion scenarios
- Design for automated verification
Control Initial State
Control Initial State
- Use data packs for consistent starting conditions
- Document any manual setup required
- Ensure reproducibility across runs
- Consider edge cases in initial data
Allow Multiple Paths
Allow Multiple Paths
- Avoid over-constraining the solution path
- Allow multiple valid approaches when appropriate
- Test tasks with human annotators first