Web UI Guide

The Scale Gymnasium Web UI at gym.scale.com provides a visual interface for exploring environments, running agent loops, and verifying task completion.

Prerequisites: Access to the Scale Gymnasium Web UI at gym.scale.com. Contact Scale if you don’t have access.

MCP Environment
Website Environments
Desktop Environments

MCP Environment

The MCP environment provides access to 45+ MCP servers and 300+ tools for tool-based agent interactions.Select an MCP server from the left sidebar under TOOL USE (e.g., Quickbooks, Hubspot CRM, Calendar, Email, Slack).

Tools Panel

The Tools for [Server] panel on the left shows:

Search tools: Filter tools by name
Refresh: Reload the tool list
List of available tools with descriptions (e.g., quickbooks_create_customer, calendar_get_events)

Click a tool to view its parameters and execute it.

Server Data Panel

The Server Data panel on the right shows the database state:

Tabs: Switch between data tables (e.g., customers, invoices, bills, items, vendors)
Refresh: Reload the data
Hide: Collapse the panel
Pagination: Navigate through records with Previous/Next

Use this to inspect the current state and verify changes after tool calls.

Tips

Efficient Tool Discovery

Find tools quickly in the MCP interface:

Use the search box to filter by tool name
Select different servers from the sidebar to see their tools
Read tool descriptions for parameter guidance

Website Environments

Website environments include Calendar, Cloudfile, Shopora, and Pandora’s Inbox.

Interface Views

Each website environment has two tabs:

View	Purpose
GUI View	The live, interactive website
Sample RL Data	Pre-built tasks with prompts and verifiers

You can also view the Application State to see real-time database changes.

GUI View

The live, interactive web application:

Interact with the website as a user would
All interactions are logged automatically
Session-scoped isolation prevents cross-contamination

Application State

Real-time database view:

See all tables (events, users, files, etc.)
Watch state update as you interact
Copy data for verification design

Sample RL Data

Pre-built tasks for testing:

Browse tasks: See available prompts
Open in Agent Executor: Load the task into the agent executor

Running Agent Loops

Select a task from Sample RL Data
Click Open in Agent Executor
Watch the trajectory unfold

Output includes:

Step-by-step trajectory
Screenshots at each action
Model chain of thought
Next goal predictions

After the agent loop completes, click the Execute button next to the verifier to run verification.

Tips

Debugging with Application State

Use the Application State view to understand exactly what changed:

Note the state before your action
Perform the action
Compare the new state
Use this to design accurate verifiers

Using Sample RL Data as Templates

Sample tasks demonstrate proper task structure:

Study the prompt format
Examine verifier check configuration
Use as templates for custom tasks

Desktop Environments

Desktop environments provide access to Ubuntu, Windows, and Mac virtual machines.

Interface Tabs

Tab	Purpose
Environment & Tools	Live VM sandbox view with controls
Sample RL Data	Pre-built tasks with prompts and verifiers

Environment & Tools

The sandbox view shows the live VM:

Reset: Reset the VM to initial state
Fullscreen: Expand the VM view

Below the sandbox, the Task Configuration section allows you to:

Select a preset task from the dropdown
View the task prompt

Sample RL Data

Browse pre-built tasks:

Each task shows the prompt, task initializer config, and verifier
Click Load Task to load it into the environment

Running a Task

Select an OS from the left sidebar (Ubuntu, Windows, or Mac)
Go to Sample RL Data and click Load Task on a task
The task configuration loads with:
- Task Prompt: The instruction for the agent
- Task Initializer: Setup config (click Execute Initializer to run)
- Verifier: Evaluation config (click Execute Verifier after completion)

Workflow

Click Execute Initializer to set up the VM with required files and applications
Complete the task manually in the VM, or use the Agent Executor
Click Execute Verifier to check if the task was completed successfully

Working with Verifiers

The Web UI provides integrated verification for Website and Desktop environments.

MCP environments do not have verifier support in the Gymnasium UI.

Running Verification

Complete a task (manually or via agent)
Click Execute next to the verifier
View check results

Check Types

Type	Description	Result
State Check	Verifies database changes	Pass/Fail
Log Check	Verifies interaction logs	Pass/Fail
Rubric Check	LLM-evaluated criteria	Pass/Fail/Pending

Interpreting Results

✅ Passed: All checks succeeded
❌ Failed: One or more checks failed (see details)
⏳ Pending: Rubric checks awaiting LLM evaluation

Next Steps

Key Concepts

Understand the core terminology

Docker Deployment

Run environments on your own infrastructure

Overview

Getting Started

Deep Dives

MCP Environment

Navigation

Tools Panel

Server Data Panel

Tips

Website Environments

Interface Views

GUI View

Application State

Sample RL Data

Running Agent Loops

Tips

Desktop Environments

Interface Tabs

Environment & Tools

Sample RL Data

Running a Task

Workflow

Working with Verifiers

Running Verification

Check Types

Interpreting Results

Next Steps

Key Concepts

Docker Deployment

Overview

Getting Started

Deep Dives

​MCP Environment

​Navigation

​Tools Panel

​Server Data Panel

​Tips

​Website Environments

​Interface Views

​GUI View

​Application State

​Sample RL Data

​Running Agent Loops

​Tips

​Desktop Environments

​Interface Tabs

​Environment & Tools

​Sample RL Data

​Running a Task

​Workflow

​Working with Verifiers

​Running Verification

​Check Types

​Interpreting Results

​Next Steps

Key Concepts

Docker Deployment

MCP Environment

Navigation

Tools Panel

Server Data Panel

Tips

Website Environments

Interface Views

GUI View

Application State

Sample RL Data

Running Agent Loops

Tips

Desktop Environments

Interface Tabs

Environment & Tools

Sample RL Data

Running a Task

Workflow

Working with Verifiers

Running Verification

Check Types

Interpreting Results

Next Steps