Explore live verifier examples at gym.scale.com.
Comparison
| Environment | Verifier Access | Check Types |
|---|---|---|
| Website | /api/verifier endpoint (Docker) | State, Log, Rubric |
| Desktop | /run_evaluator endpoint | File comparison, Rules-based |
| MCP | Scale Gymnasium Web UI only | LLM Judge, Rubric |
Choose Your Environment
Website Verifiers
State checks, log checks, and rubric evaluation for web apps
Desktop Verifiers
File comparison and rules-based evaluation for VMs
MCP Verifiers
LLM Judge and rubric claims for tool-use tasks
Common Concepts
Score Interpretation
| Score | Meaning |
|---|---|
1.0 | Task completed successfully |
0.0 | Task not completed |
0.0 - 1.0 | Partial completion (where applicable) |
Result Statuses
| Status | Meaning |
|---|---|
| ✅ Passed | All checks succeeded |
| ❌ Failed | One or more checks failed |
| ⏳ Pending | Rubric checks awaiting LLM evaluation |