Skip to main content
Verifiers validate whether an agent successfully completed a task. Each environment type uses a different verification approach.
Explore live verifier examples at gym.scale.com.

Comparison

EnvironmentVerifier AccessCheck Types
Website/api/verifier endpoint (Docker)State, Log, Rubric
Desktop/run_evaluator endpointFile comparison, Rules-based
MCPScale Gymnasium Web UI onlyLLM Judge, Rubric

Choose Your Environment

Website Verifiers

State checks, log checks, and rubric evaluation for web apps

Desktop Verifiers

File comparison and rules-based evaluation for VMs

MCP Verifiers

LLM Judge and rubric claims for tool-use tasks

Common Concepts

Score Interpretation

ScoreMeaning
1.0Task completed successfully
0.0Task not completed
0.0 - 1.0Partial completion (where applicable)

Result Statuses

StatusMeaning
PassedAll checks succeeded
FailedOne or more checks failed
PendingRubric checks awaiting LLM evaluation

Next Steps

Task Design

Create tasks with verifiers

Data Packs

Configure initial environment state