Skip to main content
Verifiers validate whether an agent successfully completed a task. Each environment type uses a different verification approach.
Explore live verifier examples at gym.scale.com.

Comparison

EnvironmentVerifier AccessCheck Types
Website/api/verifier endpoint (Docker)State, Log, Rubric
Desktop/run_evaluator endpointFile comparison, Rules-based
MCPScale Gymnasium Web UI onlyLLM Judge, Rubric

Choose Your Environment

Common Concepts

Score Interpretation

ScoreMeaning
1.0Task completed successfully
0.0Task not completed
0.0 - 1.0Partial completion (where applicable)

Result Statuses

StatusMeaning
PassedAll checks succeeded
FailedOne or more checks failed
PendingRubric checks awaiting LLM evaluation

Next Steps