Skip to content

MCP Sampling Support

MCP (Model Context Protocol) allows servers to request LLM completions from clients. Zap provides built-in sampling handlers that reuse your existing LiteLLM configuration, making it easy to give MCP servers access to AI capabilities.

Why MCP Sampling?

MCP servers often need to perform LLM inference—summarizing content, generating responses, or processing data. Instead of each server managing its own LLM credentials and configuration, MCP sampling lets the client (Zap) handle all LLM interactions:

  • Centralized credentials - API keys stay in one place
  • Consistent configuration - Same model, temperature across servers
  • Provider flexibility - Switch LLM providers without changing servers
  • Cost visibility - All LLM calls go through your client

Quick Start

from zap_ai import Zap, ZapAgent
from zap_ai.mcp.sampling import create_mcp_client

# Create MCP client with sampling support
client = create_mcp_client(
    "path/to/mcp_server.py",
    sampling_handler="litellm",  # Use Zap's LLM provider
    sampling_model="gpt-4o",      # Default model for sampling
)

agent = ZapAgent(
    name="SamplingAgent",
    prompt="You are a helpful assistant.",
    mcp_clients=[client],
)

zap = Zap(agents=[agent])
await zap.start()

How It Works

┌──────────────────────────────────────────────────────────────────────────┐
│                         MCP Sampling Flow                                 │
│                                                                          │
│  1. MCP Server needs LLM completion                                      │
│         │                                                                │
│         ▼                                                                │
│  2. Server sends sampling request to Client                              │
│     • Messages, system prompt                                            │
│     • Model preferences (hints)                                          │
│     • Temperature, max_tokens                                            │
│         │                                                                │
│         ▼                                                                │
│  3. LiteLLMSamplingHandler receives request                              │
│     • Converts messages to LiteLLM format                                │
│     • Extracts model from hints or uses default                          │
│     • Calls Zap's complete() function                                    │
│         │                                                                │
│         ▼                                                                │
│  4. LLM provider returns completion                                      │
│         │                                                                │
│         ▼                                                                │
│  5. Response returned to MCP Server                                      │
└──────────────────────────────────────────────────────────────────────────┘

API Reference

create_mcp_client()

Factory function for creating MCP clients with sampling support:

from zap_ai.mcp.sampling import create_mcp_client

client = create_mcp_client(
    source="path/to/server.py",
    sampling_handler="litellm",
    sampling_model="gpt-4o",
)

Parameters:

Parameter Type Default Description
source str required Path to MCP server or URL
sampling_handler str \| Callable \| None None Handler specification
sampling_model str "gpt-4o" Default model for "litellm" handler
**kwargs Additional FastMCP Client arguments

Handler options: - None - No sampling support (default) - "litellm" - Use built-in LiteLLMSamplingHandler - Callable - Custom async handler function

LiteLLMSamplingHandler

Direct access to the sampling handler class:

from zap_ai.mcp.sampling import LiteLLMSamplingHandler
from fastmcp import Client

handler = LiteLLMSamplingHandler(
    default_model="anthropic/claude-sonnet-4-5-20250929",
    default_temperature=0.7,
    default_max_tokens=1000,
)

client = Client("server.py", sampling_handler=handler)

Parameters:

Parameter Type Default Description
default_model str "gpt-4o" LiteLLM model identifier
default_temperature float 0.7 Sampling temperature (0.0-2.0)
default_max_tokens int \| None None Max tokens to generate

Custom Sampling Handlers

Create a custom handler for specialized behavior:

from zap_ai.mcp.sampling import create_mcp_client

async def my_handler(messages, params, context):
    """Custom sampling handler with logging."""
    import logging
    logging.info(f"Sampling request: {len(messages)} messages")

    # Call your preferred LLM
    response = await my_llm_call(messages, params)

    logging.info(f"Response: {len(response)} chars")
    return response

client = create_mcp_client(
    "server.py",
    sampling_handler=my_handler,
)

Using FastMCP's Built-in Handlers

FastMCP provides its own sampling handlers you can use directly:

from fastmcp import Client
from fastmcp.client.sampling import AnthropicSamplingHandler

# Use FastMCP's Anthropic handler directly
client = Client(
    "server.py",
    sampling_handler=AnthropicSamplingHandler(
        default_model="claude-sonnet-4-5-20250929"
    ),
)

agent = ZapAgent(
    name="MyAgent",
    mcp_clients=[client],
    ...
)

Best Practices

Model Selection

# Good: Use model hints in server requests
# Server can request specific models, handler uses default as fallback

# Configure a sensible default
client = create_mcp_client(
    "server.py",
    sampling_handler="litellm",
    sampling_model="gpt-4o",  # Fast, capable default
)

Temperature Settings

Use Case Recommended Temperature
Factual responses 0.0 - 0.3
Balanced creativity 0.5 - 0.7
Creative generation 0.8 - 1.0

Error Handling

The handler returns empty string if the LLM returns no content:

handler = LiteLLMSamplingHandler()
result = await handler(messages, params, context)
# result is "" if LLM returns None

See Also