Evaluate task completion and get a score
evaluator object within task_config defines how to verify task completion.
compare_table, file_check, exact_match)| Type | Description |
|---|---|
cloud_file | File hosted at a URL (for gold standard) |
vm_file | File on the VM filesystem (for result) |
| Type | Description |
|---|---|
execute | Run a shell command |
sleep | Wait for specified seconds |
| Rule Type | Description |
|---|---|
pivot_table | Compare pivot table structure |
freeze | Compare freeze pane settings |
exact_match | Byte-for-byte comparison |
structural | Compare document structure |
success){"score": number, "message": string} in future versions.| Function | Use Case |
|---|---|
compare_table | Excel/spreadsheet with rules |
compare_docx_files | Word documents |
compare_pptx_files | PowerPoint presentations |
compare_pdfs | PDF files |
compare_images | Image similarity |
compare_text_file | Text files |
exact_match | Byte-for-byte comparison |
fuzzy_match | Fuzzy string matching |
check_json | JSON validation |
check_file_exists | File existence |
is_extension_installed | VS Code extensions |
infeasible | Task cannot be completed |
Use postconfig to save files
postconfig to trigger save before evaluation:Add delays for UI stability
sleep actions between commands to allow the UI to settle:Use specific rules for complex comparisons