Observability¶

Zap supports tracing via a pluggable provider system. Tracing helps you understand agent behavior, debug issues, and monitor performance.

Langfuse Integration¶

Langfuse is currently the supported tracing provider.

Setup¶

Install with Langfuse support:
```
pip install zap-ai[langfuse]
```

Configure environment variables:

export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."
# Optional: self-hosted Langfuse
export LANGFUSE_HOST="https://cloud.langfuse.com"

Enable tracing in your application:

from zap_ai import Zap, ZapAgent
from zap_ai.tracing import set_tracing_provider
from zap_ai.tracing.langfuse_provider import LangfuseTracingProvider

# Initialize the provider
provider = LangfuseTracingProvider()
set_tracing_provider(provider)

# Your normal Zap setup
agent = ZapAgent(name="MyAgent", prompt="...")
zap = Zap(agents=[agent])

async def main():
    await zap.start()

    task = await zap.execute_task(
        agent_name="MyAgent",
        task="Do something",
    )

    # ... wait for completion ...

    # Important: flush traces before shutdown
    await provider.flush()
    await zap.stop()

What Gets Traced¶

Each task execution creates a trace containing:

Observation Type	Description
Task/Trace	Root span for the entire task execution
Iteration	Each agentic loop iteration
Generation	LLM inference calls with token usage
Tool	Tool executions with inputs/outputs
Agent	Sub-agent delegations (child workflows)

Viewing Traces¶

After running your agent, view traces in the Langfuse dashboard:

Go to cloud.langfuse.com (or your self-hosted instance)
Navigate to Traces
Click on a trace to see the full execution timeline

You'll see:

Complete conversation flow
LLM prompts and responses
Token usage and costs
Tool call inputs and outputs
Sub-agent delegation chains
Timing for each operation

Custom Tracing Providers¶

Architecture: Protocol vs ABC¶

Zap's tracing system has two related but distinct components:

Component	Purpose	When to Use
`TracingProvider` (Protocol)	Defines the interface contract	Used internally for type hints. You don't need to interact with this directly.
`BaseTracingProvider` (ABC)	Implementation helper with utilities	Use this when building custom providers.

Any class inheriting from BaseTracingProvider automatically satisfies the TracingProvider protocol, so type checking works seamlessly.

Building a Custom Provider¶

Zap provides an abstract base class BaseTracingProvider that you can extend to implement custom tracing backends. This approach gives you:

Utility methods for generating trace/span IDs and creating contexts
Default implementations for optional methods (add_event, set_error, flush, shutdown)
Clear interface showing exactly which methods you need to implement

Minimal Implementation¶

To create a custom provider, inherit from BaseTracingProvider and implement four methods:

from zap_ai.tracing import BaseTracingProvider
from zap_ai.tracing.protocol import ObservationType, TraceContext

class MyTracingProvider(BaseTracingProvider):
    """Custom tracing provider example."""

    async def _start_trace_impl(
        self,
        name,
        session_id=None,
        user_id=None,
        metadata=None,
        tags=None,
    ):
        # Create your trace and return (context, cleanup_data)
        # cleanup_data is passed to _end_trace_cleanup when the trace ends
        ctx = self._create_context()
        return ctx, None  # None = no cleanup needed

    async def _start_observation_impl(
        self,
        name,
        observation_type,
        parent_context,
        metadata=None,
        input_data=None,
    ):
        # Create child observation, preserving the trace_id
        ctx = self._create_child_context(parent_context)
        return ctx, None

    async def start_generation(
        self,
        name,
        parent_context,
        model,
        input_messages,
        metadata=None,
    ):
        # Track LLM generation
        return self._create_child_context(parent_context)

    async def end_generation(self, context, output, usage=None):
        # Record generation output and token usage
        pass

Optional Methods¶

You can override these methods for additional functionality:

class MyTracingProvider(BaseTracingProvider):
    # ... required methods ...

    async def _end_trace_cleanup(self, context, cleanup_data):
        """Called when a trace context manager exits."""
        # cleanup_data is whatever you returned from _start_trace_impl
        pass

    async def _end_observation_cleanup(self, context, cleanup_data):
        """Called when an observation context manager exits."""
        pass

    async def add_event(self, context, name, attributes=None):
        """Log events within observations."""
        pass

    async def set_error(self, context, error):
        """Mark an observation as errored."""
        pass

    async def flush(self):
        """Flush any buffered trace data."""
        pass

    async def shutdown(self):
        """Cleanup resources."""
        pass

Utility Methods¶

The base class provides these utility methods:

_generate_trace_id() - Generate a unique 32-character hex trace ID
_generate_span_id(w3c_format=False) - Generate a span ID (16 chars if W3C format, 32 otherwise)
_create_context(trace_id=None, span_id=None, provider_data=None) - Create a new TraceContext
_create_child_context(parent, span_id=None, provider_data=None) - Create a child context preserving the parent's trace_id

Register Your Provider¶

from zap_ai.tracing import set_tracing_provider

provider = MyTracingProvider()
set_tracing_provider(provider)

Disabling Tracing¶

By default, Zap uses a no-op tracing provider that does nothing. To explicitly disable tracing after enabling it:

from zap_ai.tracing import reset_tracing_provider

reset_tracing_provider()

Best Practices¶

Always flush before shutdown - Call await provider.flush() to ensure all traces are sent
Use meaningful task names - Task IDs include the agent name, making traces easier to filter
Add metadata - Use context to add user IDs or session info that appears in traces
Monitor in production - Tracing has minimal overhead and is safe for production use

API Reference¶

See the Tracing API for full documentation.