Control Plane
Run Evaluator
Evaluate task completion and get a score
POST
Runs the evaluator to check if a task was completed successfully. Compares the current VM state against the expected outcome defined in the task configuration and produces a score from 0.0 to 1.0.
See Desktop Verifiers for the complete list.
Explore live examples of evaluator configurations at gym.scale.com.
Request Body
VM identifier
Task configuration with evaluator definition
Evaluator Configuration
Theevaluator object within task_config defines how to verify task completion.
Core Fields
Evaluation function to use (e.g.,
compare_table, file_check, exact_match)Expected/gold standard file configuration
Result file from the VM to compare
Actions to run before evaluation (e.g., save file, activate app)
Evaluation options and rules
Expected/Result File Types
| Type | Description |
|---|---|
cloud_file | File hosted at a URL (for gold standard) |
vm_file | File on the VM filesystem (for result) |
Postconfig Actions
Actions to prepare the VM state before evaluation:| Type | Description |
|---|---|
execute | Run a shell command |
sleep | Wait for specified seconds |
Evaluation Rules
Rules define how files are compared:| Rule Type | Description |
|---|---|
pivot_table | Compare pivot table structure |
freeze | Compare freeze pane settings |
exact_match | Byte-for-byte comparison |
structural | Compare document structure |
Response
Result status (
success)Evaluation result message (describes pass/fail reason)
VM identifier
Task identifier
Score from 0.0 to 1.0 (1.0 = fully completed)
Response structure may be simplified to
{"score": number, "message": string} in future versions.Example: Excel Pivot Table Verification
This example verifies that a pivot table was correctly created in Excel:Evaluator Functions
Over 100 functions available. Common ones:| Function | Use Case |
|---|---|
compare_table | Excel/spreadsheet with rules |
compare_docx_files | Word documents |
compare_pptx_files | PowerPoint presentations |
compare_pdfs | PDF files |
compare_images | Image similarity |
compare_text_file | Text files |
exact_match | Byte-for-byte comparison |
fuzzy_match | Fuzzy string matching |
check_json | JSON validation |
check_file_exists | File existence |
is_extension_installed | VS Code extensions |
infeasible | Task cannot be completed |
Best Practices
Use postconfig to save files
Use postconfig to save files
Many applications don’t auto-save. Use
postconfig to trigger save before evaluation:Add delays for UI stability
Add delays for UI stability
Use
sleep actions between commands to allow the UI to settle:Use specific rules for complex comparisons
Use specific rules for complex comparisons
For spreadsheets, define which aspects to compare: