Skip to main content

Environment

The “Environment” is equivalent to the term “Gymnasium.” It is a simulated system (e.g., MCP, Desktop, Website) in which agents and researchers can operate, interact, and learn. Environments provide a controlled and repeatable system in which agents can take actions and receive feedback consistently, supporting stable and reliable learning.

Types of Environments

TypeDescription
Website EnvironmentsFull-stack web applications (Calendr, Cloudfile, Shopora, Pandora’s Inbox) for GUI-based testing
Desktop EnvironmentsIsolated, controllable Virtual Machine desktops (Windows, Ubuntu, macOS)
Tool Use Environment (MCP)45+ Model Context Protocol servers providing tool-based agent interactions

Artifacts

A collection of data that defines the initial state of an environment or scenario. It serves as the baseline data that is used to populate (“hydrate”) the system for a given task, and the environment can be reset back to this initial configuration.

Task

A specific challenge or problem set for an agent to solve within an environment. Most tasks contain the following four components:
  • Initial State: An environment state or Universe to begin the task
  • Prompt: The instructions given to the agent
  • Available Tools: The set of tools the agent can use
  • Verifier: The mechanism used to evaluate the agent’s success

Task Types

Tasks are primarily categorized into two types based on the agent’s goal:
TypeDescription
Information-seekingTasks that require the agent to retrieve information and respond, without modifying the environment’s state
Action-takingTasks that require the agent to perform actions that change the environment’s state (e.g., database mutations)

Verifier

The Verifier is the component responsible for measuring the agent’s success and producing a reward signal.

Verifier Access by Environment

EnvironmentVerifier Access
Website EnvironmentsContainer endpoint (POST /verifier) — directly accessible
Desktop EnvironmentsControl plane endpoint (POST /run_evaluator) — directly accessible
MCP EnvironmentScale Gymnasium UI only

Agent Loop

The Agent Loop is the system that performs the task programmatically by interacting with the environment.
  • Functionality: It is a core feature of the Scale Gymnasium platform, supporting the execution of agent models against one or multiple scenarios
  • Output: Upon running an agent loop on a prompt in the Agent Executor tab, the output should contain a trajectory of the steps that the agent took, screenshots of specific actions in the environment, and additional information such as the model chain of thought and the next goal for the model
  • Desktop Integration: For Desktop Environments, it is specifically designed to attach a multimodal agent loop to the VM desktop for programmatic task execution

Next Steps