Key Concepts Overview

Environment

The “Environment” is equivalent to the term “Gymnasium.” It is a simulated system (e.g., MCP, Desktop, Website) in which agents and researchers can operate, interact, and learn. Environments provide a controlled and repeatable system in which agents can take actions and receive feedback consistently, supporting stable and reliable learning.

Types of Environments

Type	Description
Website Environments	Full-stack web applications (Calendr, Cloudfile, Shopora, Pandora’s Inbox) for GUI-based testing
Desktop Environments	Isolated, controllable Virtual Machine desktops (Windows, Ubuntu, macOS)
Tool Use Environment (MCP)	45+ Model Context Protocol servers providing tool-based agent interactions

Artifacts

A collection of data that defines the initial state of an environment or scenario. It serves as the baseline data that is used to populate (“hydrate”) the system for a given task, and the environment can be reset back to this initial configuration.

Task

A specific challenge or problem set for an agent to solve within an environment. Most tasks contain the following four components:

Initial State: An environment state or Universe to begin the task
Prompt: The instructions given to the agent
Available Tools: The set of tools the agent can use
Verifier: The mechanism used to evaluate the agent’s success

Task Types

Tasks are primarily categorized into two types based on the agent’s goal:

Type	Description
Information-seeking	Tasks that require the agent to retrieve information and respond, without modifying the environment’s state
Action-taking	Tasks that require the agent to perform actions that change the environment’s state (e.g., database mutations)

Verifier

The Verifier is the component responsible for measuring the agent’s success and producing a reward signal.

Verifier Access by Environment

Environment	Verifier Access
Website Environments	Container endpoint (`POST /verifier`) — directly accessible
Desktop Environments	Control plane endpoint (`POST /run_evaluator`) — directly accessible
MCP Environment	Scale Gymnasium UI only

Agent Loop

The Agent Loop is the system that performs the task programmatically by interacting with the environment.

Functionality: It is a core feature of the Scale Gymnasium platform, supporting the execution of agent models against one or multiple scenarios
Output: Upon running an agent loop on a prompt in the Agent Executor tab, the output should contain a trajectory of the steps that the agent took, screenshots of specific actions in the environment, and additional information such as the model chain of thought and the next goal for the model
Desktop Integration: For Desktop Environments, it is specifically designed to attach a multimodal agent loop to the VM desktop for programmatic task execution

Overview

Getting Started

Deep Dives

Environment

Types of Environments

Artifacts

Task

Task Types

Verifier

Verifier Access by Environment

Agent Loop

Next Steps

Choose Your Path

Web UI Guide

Overview

Getting Started

Deep Dives

​Environment

​Types of Environments

​Artifacts

​Task

​Task Types

​Verifier

​Verifier Access by Environment

​Agent Loop

​Next Steps

Choose Your Path

Web UI Guide

Environment

Types of Environments

Artifacts

Task

Task Types

Verifier

Verifier Access by Environment

Agent Loop

Next Steps