Book demo

Get Started

Book demo

Get Started

Full observability for your Pipecat voice agents

Add the Tuner observer to your Pipecat pipeline and capture every call's transcript, latency, usage, and cost automatically, so you catch hallucinations, broken flows, and missed intents before your callers do.

Get Started

Book demo

from tuner_pipecat_sdk import Observer

observer = Observer(

api_key=TUNER_API_KEY,

workspace_id=42,

agent_id="my-agent",

call_id=str(uuid4()),

)

# drop it into your pipeline, right after TTS

pipeline = Pipeline([..., tts, transport.output()])

task = PipelineTask(

pipeline,

params=PipelineParams(

enable_metrics=True,

enable_usage_metrics=True,

observers=[observer, observer.latency_observer, turn_tracker],

)

Integrate in under two minutes

No re-architecting your pipeline. The Tuner observer attaches to your existing Pipecat agent and starts capturing production data immediately.

Install the SDK

pip install tuner-pipecat-sdk. Works with pipecat-ai 1.0+ on Python 3.11–3.13. Add the flows extra if you run pipecat-flows.

Set your credentials

Drop in your Tuner API key, workspace ID, and agent ID — via environment variables or inline in code.

Create the observer

Add Observer for a plain pipeline, or FlowsObserver for pipecat-flows. Pass your Tuner API key, workspace ID, and agent ID.

See every call in Tuner

Transcripts, latency, usage, and cost flow into your dashboard automatically, no manual API calls, ready to analyze and monitor.

Read the Pipecat guide →

One dashboard

Every Pipecat call, scored automatically

Transcripts, latency, usage, and red flags from every session land in one place — so quiet failures surface before they reach your churn data.

Why you need Tuner

Voice agents fail quietly, and at a scale no team can review by hand. Tuner turns every production call into signal you can debug, alert on, test against, and improve.

Debug in minutes, not days

When a call goes wrong, see exactly what happened and where — the full transcript, every turn, latency at each step, tool calls, and conversation state. No more guessing from sparse logs.

Get alerted the moment something breaks

Don’t wait for a customer complaint. Configure your own alerts with multiple triggers and conditions — by red flag, metric threshold, agent, or flow — and get notified the instant quality slips.

Test before you ship

Run call simulations and automated checks over SIP before every launch and after every change, scored against the same evals that monitor production — so you catch regressions before your callers do.

Diagnose your agent at scale

At thousands of calls a day, manual validation is impossible. Tuner finds the patterns, pinpoints where your agent breaks, and suggests how to fix it — so one engineer can stay on top of production.

Analytics that explain production

Understand how your agent actually behaves live: where it breaks, when callers get frustrated, which flows are missing, when it hallucinates, and which tool calls fail the most.

Everything you need to run Pipecat agents in production

Turn production from a black box into something you can actually monitor, measure, and improve.

Catch failures early

Hallucinations, broken flows, dead air, early hangups, and missed intents are flagged automatically — before they reach your churn data.

Component-level latency

See STT, TTS, and LLM latency broken out at p50 and p90, so you know exactly where conversations slow down.

Real-time alerts

Get notified the moment red flags or failed evals appear in production, instead of weeks later buried in logs.

Call simulation

Stress-test your agent over SIP before launch and after every change, scored against the same evals that monitor live traffic.

Pipecat Flows capture

Record FlowManager node transitions and tool calls alongside session data when your agent runs on pipecat-flows.

Cost & usage per call

Attach a cost calculator and track LLM, TTS, and STT spend on every session, no separate billing pipeline required.

Ship with confidence

Catch regressions the moment a new version ships

Compare agent versions across success rate, red-flag rate, and cost per call, and see exactly which failure types are driving the drop.

Tuner vs Pipecat Evals

Pipecat Evals is built for development — fast, local behavioral tests you run in CI. Tuner is the production layer that scores real calls after you ship. Most teams run both.

Capability

Tuner

Pipecat Cloud

Vendor-independent observability, eliminating the conflict of a platform evaluating its own output

Tuner

✓

LiveKit Cloud

—

LiveKit Cloud

—

Evals pricing built for scale: tuner price per call, no per minute surcharge

Tuner

✓

LiveKit Cloud

—

LiveKit Cloud

—

Automated quality scoring (evals) on live calls

Tuner

✓

LiveKit Cloud

—

LiveKit Cloud

—

Voice red flags (hallucination, dead air, early hangup)

Tuner

✓

LiveKit Cloud

—

LiveKit Cloud

—

Root-cause diagnosis with a specific fix, not just metrics

Tuner

✓

LiveKit Cloud

—

LiveKit Cloud

—

30+ voice quality metrics & red flags out of the box

Tuner

✓

LiveKit Cloud

—

LiveKit Cloud

—

Drift & regression alerts over time

Tuner

✓

LiveKit Cloud

—

LiveKit Cloud

—

SIP call simulations with AI agents, using your live evals

Tuner

✓

LiveKit Cloud

—

LiveKit Cloud

—

Turn-by-turn transcripts & latency traces

Tuner

✓

LiveKit Cloud

✓

LiveKit Cloud

✓

Full observability for your Pipecat voice agents

Integrate in under two minutes

Every Pipecat call, scored automatically

Why you need Tuner

Everything you need to run Pipecat agents in production

Catch regressions the moment a new version ships

Tuner vs Pipecat Evals

Frequently asked questions

Which Pipecat versions are supported?

Do I have to change my pipeline?

What gets captured?

Can I test my agent before going live?

How long does setup take?

Does it work with SIP / phone calls?

Does Tuner support alerts and monitoring?

Can I define my own evaluations and metrics?

How is Tuner priced?

Is my call data private and secure?