The Scale Gymnasium Web UI at gym.scale.com provides a visual interface for exploring environments, running agent loops, and verifying task completion.
Prerequisites: Access to the Scale Gymnasium Web UI at gym.scale.com . Contact Scale if you don’t have access.
MCP Environment
Website Environments
Desktop Environments
MCP Environment The MCP environment provides access to 45+ MCP servers and 300+ tools for tool-based agent interactions. Navigation Select an MCP server from the left sidebar under TOOL USE (e.g., Quickbooks, Hubspot CRM, Calendar, Email, Slack). The Tools for [Server] panel on the left shows:
Search tools : Filter tools by name
Refresh : Reload the tool list
List of available tools with descriptions (e.g., quickbooks_create_customer, calendar_get_events)
Click a tool to view its parameters and execute it. Server Data Panel The Server Data panel on the right shows the database state:
Tabs : Switch between data tables (e.g., customers, invoices, bills, items, vendors)
Refresh : Reload the data
Hide : Collapse the panel
Pagination : Navigate through records with Previous/Next
Use this to inspect the current state and verify changes after tool calls. Tips
Website Environments Website environments include Calendar, Cloudfile, Shopora, and Pandora’s Inbox. Interface Views Each website environment has two tabs: View Purpose GUI View The live, interactive website Sample RL Data Pre-built tasks with prompts and verifiers
You can also view the Application State to see real-time database changes. GUI View The live, interactive web application:
Interact with the website as a user would
All interactions are logged automatically
Session-scoped isolation prevents cross-contamination
Application State Real-time database view:
See all tables (events, users, files, etc.)
Watch state update as you interact
Copy data for verification design
Sample RL Data Pre-built tasks for testing:
Browse tasks : See available prompts
Open in Agent Executor : Load the task into the agent executor
Running Agent Loops
Select a task from Sample RL Data
Click Open in Agent Executor
Watch the trajectory unfold
Output includes:
Step-by-step trajectory
Screenshots at each action
Model chain of thought
Next goal predictions
After the agent loop completes, click the Execute button next to the verifier to run verification.
Tips
Debugging with Application State
Use the Application State view to understand exactly what changed:
Note the state before your action
Perform the action
Compare the new state
Use this to design accurate verifiers
Using Sample RL Data as Templates
Sample tasks demonstrate proper task structure:
Study the prompt format
Examine verifier check configuration
Use as templates for custom tasks
Desktop Environments Desktop environments provide access to Ubuntu, Windows, and Mac virtual machines. Interface Tabs Tab Purpose Environment & Tools Live VM sandbox view with controls Sample RL Data Pre-built tasks with prompts and verifiers
The sandbox view shows the live VM:
Reset : Reset the VM to initial state
Fullscreen : Expand the VM view
Below the sandbox, the Task Configuration section allows you to:
Select a preset task from the dropdown
View the task prompt
Sample RL Data Browse pre-built tasks:
Each task shows the prompt, task initializer config, and verifier
Click Load Task to load it into the environment
Running a Task
Select an OS from the left sidebar (Ubuntu, Windows, or Mac)
Go to Sample RL Data and click Load Task on a task
The task configuration loads with:
Task Prompt : The instruction for the agent
Task Initializer : Setup config (click Execute Initializer to run)
Verifier : Evaluation config (click Execute Verifier after completion)
Workflow
Click Execute Initializer to set up the VM with required files and applications
Complete the task manually in the VM, or use the Agent Executor
Click Execute Verifier to check if the task was completed successfully
Working with Verifiers
The Web UI provides integrated verification for Website and Desktop environments.
MCP environments do not have verifier support in the Gymnasium UI.
Running Verification
Complete a task (manually or via agent)
Click Execute next to the verifier
View check results
Check Types
Type Description Result State Check Verifies database changes Pass/Fail Log Check Verifies interaction logs Pass/Fail Rubric Check LLM-evaluated criteria Pass/Fail/Pending
Interpreting Results
✅ Passed : All checks succeeded
❌ Failed : One or more checks failed (see details)
⏳ Pending : Rubric checks awaiting LLM evaluation
Next Steps