MCP Task Design

MCP tasks evaluate an agent’s ability to use tools effectively—calling the right functions with correct parameters to accomplish goals across calendar, email, CRM, filesystem, and other services.

Task Components

Component	Purpose
`ENABLED_TOOLS`	Tools available to the model (constraint space)
`PROMPT`	User’s query
`TRAJECTORY`	Intended sequence of tool calls (ground truth)
`GTFA_CLAIMS`	Required factual claims in final response

Component Details

ENABLED_TOOLS

Defines which tools the agent can access. Constraining tools increases task focus and reduces ambiguity.

{
  "ENABLED_TOOLS": [
    "calendar_get_events",
    "calendar_add_event",
    "email_send",
    "email_search"
  ]
}

Considerations:

Include only tools relevant to the task
Adding irrelevant tools tests discrimination ability

PROMPT

The natural language instruction given to the agent.

{
  "PROMPT": "Schedule a meeting with Sarah for next Tuesday at 2pm and send her a calendar invite via email"
}

Prompt Design Tips:

Be specific about expected outcomes
Include implicit constraints (e.g., “during business hours”)

TRAJECTORY

The expected sequence of tool calls (ground truth). Used for trajectory-based evaluation.

{
  "TRAJECTORY": [
    {
      "tool": "calendar_add_event",
      "args": {
        "title": "Meeting with Sarah",
        "datetime": "2025-01-21T14:00:00",
        "attendees": ["sarah@example.com"]
      }
    },
    {
      "tool": "email_send",
      "args": {
        "to": "sarah@example.com",
        "subject": "Meeting Invitation",
        "body": "..."
      }
    }
  ]
}

Note: Trajectories represent one valid path—agents may complete tasks via different valid sequences.

GTFA_CLAIMS

Ground Truth Factual Assertions that must be present in the agent’s final response.

{
  "GTFA_CLAIMS": [
    "Agent created a calendar event for Tuesday at 2:00 PM",
    "Event includes Sarah as an attendee",
    "Confirmation email was sent to sarah@example.com"
  ]
}

Example Task Structure

Complete Example

{
  "ENABLED_TOOLS": [
    "calendar_get_events",
    "calendar_add_event",
    "calendar_delete_event",
    "email_send",
    "contacts_search"
  ],
  "PROMPT": "Check my calendar for Friday. If I have a meeting with the marketing team, cancel it and send an apology email to all attendees.",
  "TRAJECTORY": [
    { "tool": "calendar_get_events", "args": { "date": "Friday" } },
    { "tool": "calendar_delete_event", "args": { "event_id": "mktg-123" } },
    { "tool": "email_send", "args": { "to": ["alice@co.com", "bob@co.com"], "subject": "Meeting Cancelled" } }
  ],
  "GTFA_CLAIMS": [
    "Agent checked Friday's calendar",
    "Agent identified the marketing team meeting",
    "Agent cancelled the meeting",
    "Agent sent apology email to all attendees (Alice and Bob)"
  ]
}

Best Practices

Define clear success criteria

GTFA claims should be specific and unambiguous:❌ “Agent completed the task correctly”✅ “Agent sent email to john@example.com with subject containing ‘Q3 Report’”

Include boundary cases

Test agent behavior at the edges:

What if no results are found?
What if multiple matches exist?
What if required information is missing?

Balance tool availability

Too few tools: Task may be impossible
Too many tools: Agent may get confused
Include 1-2 “distractor” tools to test discrimination

Design for multiple valid paths

Many tasks can be completed different ways. Ensure GTFA claims verify outcomes, not specific tool sequences.

Test with realistic data

Use data packs that reflect real usage patterns—realistic contact names, plausible email content, believable calendar schedules.

Overview

Getting Started

Deep Dives

Task Components

Component Details

ENABLED_TOOLS

PROMPT

TRAJECTORY

GTFA_CLAIMS

Example Task Structure

Complete Example

Best Practices

Next Steps

MCP Verifiers

Website Tasks

Overview

Getting Started

Deep Dives

​Task Components

​Component Details

​ENABLED_TOOLS

​PROMPT

​TRAJECTORY

​GTFA_CLAIMS

​Example Task Structure

​Complete Example

​Best Practices

​Next Steps

MCP Verifiers

Website Tasks

Task Components

Component Details

ENABLED_TOOLS

PROMPT

TRAJECTORY

GTFA_CLAIMS

Example Task Structure

Complete Example

Best Practices

Next Steps