Available Platforms
🪟 Windows
Windows-based desktop automation
🐧 Ubuntu
Linux-based desktop automation
🍎 macOS
Mac-based desktop automation
Architecture Overview
The architecture is a layered system that combines a robust control plane with highly isolated virtualized environments.Control Plane (CUA Service)
Provides lifecycle APIs and orchestration for all desktop environments.| API | Description |
|---|---|
create_desktop | Provisions a new VM, registers with metadata store |
reset_desktop | Destroys and recreates the VM from the base image |
run_initializer | Executes initialization scripts within the VM |
run_evaluator | Executes verifiers to validate task completion state |
close_desktop | Terminates the VM and cleans up ephemeral resources |
Data Plane (VM Environment)
| Platform | Hosting |
|---|---|
| Windows/Ubuntu | Scale-managed EC2 bare-metal compute (KVM acceleration) |
| macOS | Vendor-hosted on compliant Apple hardware, proxied through Scale ingress |
Network & Access
| Component | Port (VM) | Port (Host) | Access Method |
|---|---|---|---|
| noVNC Web | N/A | 8006 | Browser Access / Human-in-the-loop (SSO-gated) |
| OSWorld API | 5000 | 5000 | AI Agent Access (Programmatic control) |
| VNC Server | 5900 | 5900 | Human Access / Debugging |
| Default Egress | N/A | N/A | Default-Deny: Outbound traffic restricted |
VM Isolation Rationale
VMs are mandatory (over containers) for:- Security and Isolation: Strong guest isolation and security boundaries for running untrusted desktop applications
- Behavioral Fidelity: Correct kernel/driver semantics, device semantics, and headful desktop behavior necessary for realistic user flows
Verification Methods
Desktop environments support specialized verification for common applications:| Application | Verification Method |
|---|---|
| Microsoft Office / LibreOffice | Deep structural comparison of file formats (text, formatting, tables, formulas, charts, embedded objects) |
| Google Chrome | Browser state inspection via Chrome APIs (active tabs, history, bookmarks, extensions, cookies) |
| Operating System | File system inspection, command output validation, configuration file parsing, accessibility tree queries |
| VS Code | Editor state inspection via settings/workspace configuration, code correctness via test execution |
| GIMP | Perceptual image comparison using structural similarity metrics |