Scale Gymnasium - Scale Gymnasium

Scale Gymnasium provides a standardized collection of digital environments for agent training and evaluation. Whether you’re building tool-use agents, or GUI agents for browser-use or desktop-use, Gymnasium offers the environments, data, and verification systems you need.

Environments

Website Environments

4 synthetic web appsCalendr · Cloudfile · Shopora · Pandora’s InboxFull-stack websites with databases, logging, and built-in verification

Desktop Environments

3 VM platformsWindows · Ubuntu · macOSIsolated virtual machines for computer-use agent testing

MCP Environment

45+ servers · 300+ toolsCalendar · CRM · Email · Shopping · and moreTool-based agent interactions via Model Context Protocol

Two Ways to Use Scale Gymnasium

Method	Agent Loop	Best For
Gymnasium Web UI	Built-in — Scale provides an agent loop executor	Prototyping, testing, exploration
Docker Images	Bring your own — you implement the agent loop	Large-scale evaluation, CI/CD, training pipelines

Web UI Guide

Try environments instantly through the Gymnasium interface

Docker Quick Start

Run environments locally with your own agent loop

Key Capabilities

Session Isolation

Complete data isolation between test sessions ensures no cross-contamination

Comprehensive Logging

Automatic capture of all agent interactions for analysis and debugging

Built-in Verification

State checks, log validation, and rubric-based evaluation framework

What’s Included

Resource	Description
Environments	4 website apps + 3 desktop VMs + 45 MCP servers
Data Packs	Pre-configured datasets to hydrate environments with realistic state
Verifiers	State checks, log checks, and rubric-based evaluation
Sample RL Data	Pre-built tasks with prompts and verification criteria
APIs	HTTP endpoints for programmatic environment control

Next Steps

Key Concepts

Understand Environment, Task, Artifact, and Verifier terminology

Choose Your Path

Decide between Web UI and Docker based on your needs

Key Concepts Overview

⌘I

​Environments