Skip to main content
Scale Gymnasium provides a standardized collection of digital environments for agent training and evaluation. Whether you’re building tool-use agents, or GUI agents for browser-use or desktop-use, Gymnasium offers the environments, data, and verification systems you need.

Environments

Website Environments

4 synthetic web appsCalendr · Cloudfile · Shopora · Pandora’s InboxFull-stack websites with databases, logging, and built-in verification

Desktop Environments

3 VM platformsWindows · Ubuntu · macOSIsolated virtual machines for computer-use agent testing

MCP Environment

45+ servers · 300+ toolsCalendar · CRM · Email · Shopping · and moreTool-based agent interactions via Model Context Protocol

Two Ways to Use Scale Gymnasium

MethodAgent LoopBest For
Gymnasium Web UIBuilt-in — Scale provides an agent loop executorPrototyping, testing, exploration
Docker ImagesBring your own — you implement the agent loopLarge-scale evaluation, CI/CD, training pipelines

Web UI Guide

Try environments instantly through the Gymnasium interface

Docker Quick Start

Run environments locally with your own agent loop

Key Capabilities

Session Isolation

Complete data isolation between test sessions ensures no cross-contamination

Comprehensive Logging

Automatic capture of all agent interactions for analysis and debugging

Built-in Verification

State checks, log validation, and rubric-based evaluation framework

What’s Included

ResourceDescription
Environments4 website apps + 3 desktop VMs + 45 MCP servers
Data PacksPre-configured datasets to hydrate environments with realistic state
VerifiersState checks, log checks, and rubric-based evaluation
Sample RL DataPre-built tasks with prompts and verification criteria
APIsHTTP endpoints for programmatic environment control

Next Steps

Key Concepts

Understand Environment, Task, Artifact, and Verifier terminology

Choose Your Path

Decide between Web UI and Docker based on your needs