Full observability for your Pipecat voice agents
Add the Tuner observer to your Pipecat pipeline and capture every call's transcript, latency, usage, and cost automatically, so you catch hallucinations, broken flows, and missed intents before your callers do.
from tuner_pipecat_sdk import Observer
observer = Observer(
api_key=TUNER_API_KEY,
workspace_id=42,
agent_id="my-agent",
call_id=str(uuid4()),
)
# drop it into your pipeline, right after TTS
pipeline = Pipeline([..., tts, transport.output()])
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[observer, observer.latency_observer, turn_tracker],
)
Integrate in under two minutes
No re-architecting your pipeline. The Tuner observer attaches to your existing Pipecat agent and starts capturing production data immediately.
01
Install the SDK
pip install tuner-pipecat-sdk. Works with pipecat-ai 1.0+ on Python 3.11–3.13. Add the flows extra if you run pipecat-flows.
02
Set your credentials
Drop in your Tuner API key, workspace ID, and agent ID — via environment variables or inline in code.
03
Create the observer
Add Observer for a plain pipeline, or FlowsObserver for pipecat-flows. Pass your Tuner API key, workspace ID, and agent ID.
04
See every call in Tuner
Transcripts, latency, usage, and cost flow into your dashboard automatically, no manual API calls, ready to analyze and monitor.
Read the Pipecat guide →
One dashboard
Every Pipecat call, scored automatically
Transcripts, latency, usage, and red flags from every session land in one place — so quiet failures surface before they reach your churn data.

Why you need Tuner
Voice agents fail quietly, and at a scale no team can review by hand. Tuner turns every production call into signal you can debug, alert on, test against, and improve.
01
Debug in minutes, not days
When a call goes wrong, see exactly what happened and where — the full transcript, every turn, latency at each step, tool calls, and conversation state. No more guessing from sparse logs.
02
Get alerted the moment something breaks
Don’t wait for a customer complaint. Configure your own alerts with multiple triggers and conditions — by red flag, metric threshold, agent, or flow — and get notified the instant quality slips.
03
Test before you ship
Run call simulations and automated checks over SIP before every launch and after every change, scored against the same evals that monitor production — so you catch regressions before your callers do.
04
Diagnose your agent at scale
At thousands of calls a day, manual validation is impossible. Tuner finds the patterns, pinpoints where your agent breaks, and suggests how to fix it — so one engineer can stay on top of production.
05
Analytics that explain production
Understand how your agent actually behaves live: where it breaks, when callers get frustrated, which flows are missing, when it hallucinates, and which tool calls fail the most.
Everything you need to run Pipecat agents in production
Turn production from a black box into something you can actually monitor, measure, and improve.
Catch failures early
Hallucinations, broken flows, dead air, early hangups, and missed intents are flagged automatically — before they reach your churn data.
Component-level latency
See STT, TTS, and LLM latency broken out at p50 and p90, so you know exactly where conversations slow down.
Real-time alerts
Get notified the moment red flags or failed evals appear in production, instead of weeks later buried in logs.
Call simulation
Stress-test your agent over SIP before launch and after every change, scored against the same evals that monitor live traffic.
Pipecat Flows capture
Record FlowManager node transitions and tool calls alongside session data when your agent runs on pipecat-flows.
Cost & usage per call
Attach a cost calculator and track LLM, TTS, and STT spend on every session, no separate billing pipeline required.
Ship with confidence
Catch regressions the moment a new version ships
Compare agent versions across success rate, red-flag rate, and cost per call, and see exactly which failure types are driving the drop.

Tuner vs Pipecat Evals
Pipecat Evals is built for development — fast, local behavioral tests you run in CI. Tuner is the production layer that scores real calls after you ship. Most teams run both.
Vendor-independent observability, eliminating the conflict of a platform evaluating its own output
✓
Evals pricing built for scale: tuner price per call, no per minute surcharge
✓
Automated quality scoring (evals) on live calls
✓
Voice red flags (hallucination, dead air, early hangup)
✓
Root-cause diagnosis with a specific fix, not just metrics
✓
30+ voice quality metrics & red flags out of the box
✓
Drift & regression alerts over time
✓
SIP call simulations with AI agents, using your live evals
✓
Turn-by-turn transcripts & latency traces
✓
Frequently asked questions
Which Pipecat versions are supported?
+
Do I have to change my pipeline?
+
What gets captured?
+
Can I test my agent before going live?
+
How long does setup take?
+
Does it work with SIP / phone calls?
+
Does Tuner support alerts and monitoring?
+
Can I define my own evaluations and metrics?
+
How is Tuner priced?
+
Is my call data private and secure?
+
