Skip to main content
Desktop Environments provide a unified, secure, and controllable runtime for desktop workflows, essential for agent evaluation and dataset creation.

Available Platforms

🪟 Windows

Windows-based desktop automation

🐧 Ubuntu

Linux-based desktop automation

🍎 macOS

Mac-based desktop automation

Architecture Overview

The architecture is a layered system that combines a robust control plane with highly isolated virtualized environments.

Control Plane (CUA Service)

Provides lifecycle APIs and orchestration for all desktop environments.
APIDescription
create_desktopProvisions a new VM, registers with metadata store
reset_desktopDestroys and recreates the VM from the base image
run_initializerExecutes initialization scripts within the VM
run_evaluatorExecutes verifiers to validate task completion state
close_desktopTerminates the VM and cleans up ephemeral resources

Data Plane (VM Environment)

PlatformHosting
Windows/UbuntuScale-managed EC2 bare-metal compute (KVM acceleration)
macOSVendor-hosted on compliant Apple hardware, proxied through Scale ingress

Network & Access

ComponentPort (VM)Port (Host)Access Method
noVNC WebN/A8006Browser Access / Human-in-the-loop (SSO-gated)
OSWorld API50005000AI Agent Access (Programmatic control)
VNC Server59005900Human Access / Debugging
Default EgressN/AN/ADefault-Deny: Outbound traffic restricted

VM Isolation Rationale

VMs are mandatory (over containers) for:
  • Security and Isolation: Strong guest isolation and security boundaries for running untrusted desktop applications
  • Behavioral Fidelity: Correct kernel/driver semantics, device semantics, and headful desktop behavior necessary for realistic user flows

Verification Methods

Desktop environments support specialized verification for common applications:
ApplicationVerification Method
Microsoft Office / LibreOfficeDeep structural comparison of file formats (text, formatting, tables, formulas, charts, embedded objects)
Google ChromeBrowser state inspection via Chrome APIs (active tabs, history, bookmarks, extensions, cookies)
Operating SystemFile system inspection, command output validation, configuration file parsing, accessibility tree queries
VS CodeEditor state inspection via settings/workspace configuration, code correctness via test execution
GIMPPerceptual image comparison using structural similarity metrics

Next Steps