Environment
The “Environment” is equivalent to the term “Gymnasium.” It is a simulated system (e.g., MCP, Desktop, Website) in which agents and researchers can operate, interact, and learn. Environments provide a controlled and repeatable system in which agents can take actions and receive feedback consistently, supporting stable and reliable learning.Types of Environments
| Type | Description |
|---|---|
| Website Environments | Full-stack web applications (Calendr, Cloudfile, Shopora, Pandora’s Inbox) for GUI-based testing |
| Desktop Environments | Isolated, controllable Virtual Machine desktops (Windows, Ubuntu, macOS) |
| Tool Use Environment (MCP) | 45+ Model Context Protocol servers providing tool-based agent interactions |
Artifacts
A collection of data that defines the initial state of an environment or scenario. It serves as the baseline data that is used to populate (“hydrate”) the system for a given task, and the environment can be reset back to this initial configuration.Task
A specific challenge or problem set for an agent to solve within an environment. Most tasks contain the following four components:- Initial State: An environment state or Universe to begin the task
- Prompt: The instructions given to the agent
- Available Tools: The set of tools the agent can use
- Verifier: The mechanism used to evaluate the agent’s success
Task Types
Tasks are primarily categorized into two types based on the agent’s goal:| Type | Description |
|---|---|
| Information-seeking | Tasks that require the agent to retrieve information and respond, without modifying the environment’s state |
| Action-taking | Tasks that require the agent to perform actions that change the environment’s state (e.g., database mutations) |
Verifier
The Verifier is the component responsible for measuring the agent’s success and producing a reward signal.Verifier Access by Environment
| Environment | Verifier Access |
|---|---|
| Website Environments | Container endpoint (POST /verifier) — directly accessible |
| Desktop Environments | Control plane endpoint (POST /run_evaluator) — directly accessible |
| MCP Environment | Scale Gymnasium UI only |
Agent Loop
The Agent Loop is the system that performs the task programmatically by interacting with the environment.- Functionality: It is a core feature of the Scale Gymnasium platform, supporting the execution of agent models against one or multiple scenarios
- Output: Upon running an agent loop on a prompt in the Agent Executor tab, the output should contain a trajectory of the steps that the agent took, screenshots of specific actions in the environment, and additional information such as the model chain of thought and the next goal for the model
- Desktop Integration: For Desktop Environments, it is specifically designed to attach a multimodal agent loop to the VM desktop for programmatic task execution