Docker Quick Start

This guide walks you through running Scale Gymnasium environments on your own infrastructure using Docker. Select the environment type you want to deploy:

Prerequisites:

Docker installed (version 20.10+)
Docker images from Scale (contact Scale to receive)

MCP Environments
Website Environments
Desktop Environments

MCP Environments

Deploy MCP (Model Context Protocol) server environments with 50+ available tools across calendar, email, CRM, filesystem, Slack, and more.

Step 1: Load the Docker Image

The MCP environment is distributed as agent-environment.tar. Load it into your local registry:

docker load -i agent-environment.tar

Verify the image is available:

docker images | grep agent-environment

Step 2: Run the Container

Start the container, exposing port 1984:

docker run -d -p 1984:1984 agent-environment:latest

The environment is now running at http://localhost:1984.Optionally, load a specific scenario by passing a universe ID:

docker run -d -p 1984:1984 -e UNIVERSE_ID=my_scenario_123 agent-environment:latest

Step 3: Initialize a Session

Reset the environment to start a new episode:

Endpoint	Method	Purpose
`/reset`	POST	Reset all MCP servers and start a new episode

You can optionally pass a universe_id in the request body to load a specific scenario:

curl -X POST http://localhost:1984/reset \
  -H "Content-Type: application/json" \
  -d '{"options": {"universe_id": "my_scenario_123"}}'

Step 4: Interact with the Environment

Example: List available tools

curl -X POST http://localhost:1984/list-tools

Example: Call a tool

curl -X POST http://localhost:1984/call-tool \
  -H "Content-Type: application/json" \
  -d '{"tool_name": "calendar_get_calendar_events", "tool_args": {}}'

See the MCP Environment API Reference for all available endpoints.

Step 5: Verify Results

See the MCP Verifiers guide for more details on verification.

Success!

You’ve deployed an MCP environment locally. You can now:

Scale to multiple parallel containers
Integrate with your training pipeline
Implement your own agent loop

Website Environments

Deploy web application environments like Calendr, Cloudfile, Shopora, or Pandora’s Inbox.

Step 1: Load the Docker Image

Docker images are distributed as tar files. Load them into your local registry:

docker load -i calendr.tar

Verify the image is available:

docker images | grep calendr

Step 2: Run the Container

Start the container, exposing the appropriate port:

docker run -d -p 3000:3000 calendr:latest

The environment is now running at http://localhost:3000.

Step 3: Initialize a Session

Create a session and optionally load a data pack:

Endpoint	Method	Purpose
`/reset`	POST	Initialize session with data pack

Include a unique sessionId to isolate your test session.

Step 4: Interact with the Environment

Navigate to the website in your browser with your session ID:

http://localhost:3000?session_id=your-session-id

Your agent can now interact with the page through browser automation. We recommend using the browser-use framework to configure an agent loop that can navigate, click, type, and perform actions on the website.See the Website Environment API Reference for additional endpoints like /reset and /verifier.

Step 5: Verify Results

Call the verifier endpoint to check task completion:

Endpoint	Method	Purpose
`/verifier`	POST	Run verification checks

Success!

You’ve deployed a website environment locally. You can now:

Scale to multiple parallel containers
Integrate with your training pipeline
Implement your own agent loop

Desktop Environments

Deploy full desktop virtual machine environments running Linux, Windows, or macOS.

Hardware Requirements:

Linux/Windows: Bare-metal hosts with KVM support
macOS: Apple Mac hardware (Mac Metal instances via Lumier provider)

Step 1: Load the Docker Image

Load the desktop orchestrator image into your local registry:

docker load -i desktop-orchestrator.tar

The orchestrator manages VM disk images (qcow2 format) internally. Contact Scale for access to the VM images for your target operating systems.

Step 2: Run the Container

Start the orchestration container, exposing port 3000:

docker run -d -p 3000:3000 desktop-orchestrator:latest

The orchestration server manages VM lifecycle, noVNC connectivity, and task execution.

Step 3: Initialize a Session

Create a new desktop environment by specifying the OS type:

curl -X POST http://localhost:3000/create_desktop \
  -H "Content-Type: application/json" \
  -d '{
    "os_type": "linux",
    "require_a11y_tree": true,
    "timeout": 3600
  }'

Supported os_type values: linux, windows, macosThis returns a task_id for tracking. Poll the task status until the VM is ready:

curl -X GET http://localhost:3000/task_status/{task_id}

Once complete, you’ll receive a vm_id and vnc_url for the environment.Run task-specific initialization (download assets, open apps, run setup scripts):

curl -X POST http://localhost:3000/initialize_task \
  -H "Content-Type: application/json" \
  -d '{"vm_id": "vm-abc123", "task_config": {...}}'

Step 4: Interact with the Environment

Build your own agent loop by connecting directly to the in-VM server (running on the VM’s mapped port):

Get screenshots: GET /screenshot
Get accessibility tree: GET /accessibility
Execute commands: POST /execute

You can implement your agent loop from scratch by capturing screenshots, sending them to a vision LLM (e.g., GPT-4o, Claude), and executing the returned actions. Alternatively, use the try-cua library after spinning up a CUA server inside the VM.See the Desktop Environment API Reference for all available endpoints.

Step 5: Verify Results

Run the task-specific verifier to assess state and return a score:

curl -X POST http://localhost:3000/run_evaluator \
  -H "Content-Type: application/json" \
  -d '{"vm_id": "vm-abc123", "task_config": {...}}'

See the Desktop Verifiers guide for more details on verification.

Success!

You’ve deployed a desktop environment locally. You can now:

Scale to multiple parallel containers
Integrate with your training pipeline
Implement your own agent loop

Next Steps

Web UI Guide

Complete walkthrough of the Gymnasium Web interface

API Reference

Full API documentation for all endpoints

Overview

Getting Started

Deep Dives

MCP Environments

Step 1: Load the Docker Image

Step 2: Run the Container

Step 3: Initialize a Session

Step 4: Interact with the Environment

Step 5: Verify Results

Success!

Website Environments

Step 1: Load the Docker Image

Step 2: Run the Container

Step 3: Initialize a Session

Step 4: Interact with the Environment

Step 5: Verify Results

Success!

Desktop Environments

Step 1: Load the Docker Image

Step 2: Run the Container

Step 3: Initialize a Session

Step 4: Interact with the Environment

Step 5: Verify Results

Success!

Next Steps

Web UI Guide

API Reference

Overview

Getting Started

Deep Dives

​MCP Environments

​Step 1: Load the Docker Image

​Step 2: Run the Container

​Step 3: Initialize a Session

​Step 4: Interact with the Environment

​Step 5: Verify Results

​Success!

​Website Environments

​Step 1: Load the Docker Image

​Step 2: Run the Container

​Step 3: Initialize a Session

​Step 4: Interact with the Environment

​Step 5: Verify Results

​Success!

​Desktop Environments

​Step 1: Load the Docker Image

​Step 2: Run the Container

​Step 3: Initialize a Session

​Step 4: Interact with the Environment

​Step 5: Verify Results

​Success!

​Next Steps

Web UI Guide

API Reference

MCP Environments

Step 1: Load the Docker Image

Step 2: Run the Container

Step 3: Initialize a Session

Step 4: Interact with the Environment

Step 5: Verify Results

Success!

Website Environments

Step 1: Load the Docker Image

Step 2: Run the Container

Step 3: Initialize a Session

Step 4: Interact with the Environment

Step 5: Verify Results

Success!

Desktop Environments

Step 1: Load the Docker Image

Step 2: Run the Container

Step 3: Initialize a Session

Step 4: Interact with the Environment

Step 5: Verify Results

Success!

Next Steps