The Vapi Transient Flow, Explained Properly

Mohamed Salem

Product @ Tuner

Learn how the Vapi transient flow works and when to use it. A clear, practical guide to running one agent that adapts to every business you serve, instead of building and managing a separate agent for each one.

What Is the Transient Flow?

To understand transient, it helps to see the two ends it sits between, how most teams run voice agents today. One keeps everything on the platform, the other keeps everything on your own servers. Here they are.

1. The Persistent Flow (Vapi holds your agent)

The default way of building on Vapi where you create an assistant in the dashboard, Vapi saves it, and you get an assistant ID. Every call points at that ID.

Great for one agent. It breaks down when you build for many businesses, because now you’re creating, editing, and deleting a saved assistant for every client, and keeping all of it in sync with your own app. Two copies of the truth, slowly drifting apart.

2. The Custom Flow (you run everything yourself)

The open-source route (LiveKit, Pipecat). You run the whole agent on your own servers. Total control, nothing stored on a platform.

The downside: you now own the hard realtime work, turning speech into text, running the model, turning the reply back into speech, and catching when the caller stops talking, all fast enough to feel like a real conversation. Frameworks like LiveKit and Pipecat give you a big head start on this, but it’s still yours to run, scale, monitor, and fix.

Transient Sits in the Middle

This is the nice part. You keep the agent on your side, like the custom-stack crowd, but Vapi still runs the call and handles all the hard realtime work, like the persistent crowd.

How it works: instead of saving an assistant and sending its ID, you send the whole agent inline when the call starts, the prompt, the tools, the voice, all of it. Vapi runs the call with it and stores nothing. The agent existed for that one call, then it’s gone.

  • Persistent: Vapi holds the agent, you point at it.

  • Transient: you hold the agent, you hand Vapi a fresh copy each call.

Same call quality, same engine. The only thing that changed is who keeps the source of truth.

(These are what Vapi calls transient and permanent configurations, documented here.)

When You’d Actually Use Transient

1. You’re building B2B, for lots of different businesses. Each business needs its own agent, and its setup already lives in your database, like a platform that gives every client its own phone receptionist.

2. You need deep customization per customer. When the whole agent changes per customer, the prompt, the tools, the flow, not just a name or account number, transient lets you build that custom agent fresh on each call.

Here’s the B2B case in practice. Say you build phone receptionists for home-services companies, plumbers, electricians, HVAC. Each one has its own greeting, services, booking rules, and tone, all stored as a config per business. That config is your source of truth.

With persistent, you’d copy each config into a saved Vapi assistant and keep both in sync forever. With transient, you skip the copy: a call comes in, you read that business’s config from your database, and hand it to Vapi for the call.

It stays simple as you grow:

  • Add a business → add a row.

  • Remove a business → delete a row.

  • No saved agents, no syncing, no cleanup.

How to Build It (Step by Step)

One simple version: an inbound flow in Node, a call comes in and your server decides what agent to run. (Python is almost identical.) The whole idea: Vapi doesn’t store your agent, your server hands it over the moment a call needs it.

Step 1: Build the skeleton

A tiny server with two empty pieces, a function that builds an agent and a webhook Vapi will call. You fill both in next.

import express from "express";
const app = express();
app.use(express.json());

// (A) Turns a business into a Vapi agent. Fill this in.
function buildAssistant(business, customer) {
  // ...
}

// (B) Vapi calls this on every incoming call. Fill this in.
app.post("/vapi/webhook", async (req, res) => {
  const { message } = req.body;
  if (message.type === "assistant-request") {
    // pick the agent and return it
  }
  return res.json({}); // acknowledge everything else
});

app.listen(8000, () => console.log("listening on :8000"));
import express from "express";
const app = express();
app.use(express.json());

// (A) Turns a business into a Vapi agent. Fill this in.
function buildAssistant(business, customer) {
  // ...
}

// (B) Vapi calls this on every incoming call. Fill this in.
app.post("/vapi/webhook", async (req, res) => {
  const { message } = req.body;
  if (message.type === "assistant-request") {
    // pick the agent and return it
  }
  return res.json({}); // acknowledge everything else
});

app.listen(8000, () => console.log("listening on :8000"));
import express from "express";
const app = express();
app.use(express.json());

// (A) Turns a business into a Vapi agent. Fill this in.
function buildAssistant(business, customer) {
  // ...
}

// (B) Vapi calls this on every incoming call. Fill this in.
app.post("/vapi/webhook", async (req, res) => {
  const { message } = req.body;
  if (message.type === "assistant-request") {
    // pick the agent and return it
  }
  return res.json({}); // acknowledge everything else
});

app.listen(8000, () => console.log("listening on :8000"));
Step 2: Turn a business into an agent

Given a stored config, return a Vapi agent. Everything specific to this business, prompt, tools, voice, comes from your database here.

function buildAssistant(business, customer) {
  return {
    firstMessage: `Thanks for calling ${business.name}! How can I help?`,
    model: {
      provider: "openai",
      model: "gpt-4o",
      messages: [{ role: "system", content: business.systemPrompt }],
      tools: business.tools,        // this business's tools
    },
    voice: business.voice,          // this business's voice
  };
}
function buildAssistant(business, customer) {
  return {
    firstMessage: `Thanks for calling ${business.name}! How can I help?`,
    model: {
      provider: "openai",
      model: "gpt-4o",
      messages: [{ role: "system", content: business.systemPrompt }],
      tools: business.tools,        // this business's tools
    },
    voice: business.voice,          // this business's voice
  };
}
function buildAssistant(business, customer) {
  return {
    firstMessage: `Thanks for calling ${business.name}! How can I help?`,
    model: {
      provider: "openai",
      model: "gpt-4o",
      messages: [{ role: "system", content: business.systemPrompt }],
      tools: business.tools,        // this business's tools
    },
    voice: business.voice,          // this business's voice
  };
}

Nothing here is saved to Vapi. You build the agent in memory the moment it’s needed, then throw it away when the call ends. The only place the real agent lives is your database.

Step 3: Point Vapi at your server

On the phone number’s settings in Vapi there’s a Server URL field. Set it to your endpoint, and add a query param so you know which business the number belongs to:


The Server URL field on a Vapi phone number, with a business id in the query string

Each business’s number uses the same endpoint with a different id. Now every call to that number sends a request to your server, with the id attached.

Step 4: Return the agent when Vapi asks

Vapi calls your endpoint, you read the id, pull the business from your database, look up the caller, build the agent, and send it back.

app.post("/vapi/webhook", async (req, res) => {
  const { message } = req.body;
  if (message.type === "assistant-request") {
    const businessId = req.query.id;                     // from the Server URL ?id=...
    const callerNumber = message.call?.customer?.number; // who's calling

    const business = await db.getBusiness(businessId);   // your source of truth
    const customer = await crm.lookup(business, callerNumber);

    return res.json({ assistant: buildAssistant(business, customer) });
  }
  return res.json({});
});
app.post("/vapi/webhook", async (req, res) => {
  const { message } = req.body;
  if (message.type === "assistant-request") {
    const businessId = req.query.id;                     // from the Server URL ?id=...
    const callerNumber = message.call?.customer?.number; // who's calling

    const business = await db.getBusiness(businessId);   // your source of truth
    const customer = await crm.lookup(business, callerNumber);

    return res.json({ assistant: buildAssistant(business, customer) });
  }
  return res.json({});
});
app.post("/vapi/webhook", async (req, res) => {
  const { message } = req.body;
  if (message.type === "assistant-request") {
    const businessId = req.query.id;                     // from the Server URL ?id=...
    const callerNumber = message.call?.customer?.number; // who's calling

    const business = await db.getBusiness(businessId);   // your source of truth
    const customer = await crm.lookup(business, callerNumber);

    return res.json({ assistant: buildAssistant(business, customer) });
  }
  return res.json({});
});

That’s the whole loop. Your database holds the truth, your function builds the agent, Vapi runs it for one call.

A few things worth knowing:

  • The lookup delay is fine. This happens while the phone is still ringing. A quick database read plus a caller lookup usually finishes in a ring or two, and the caller never notices.

  • But keep it fast. Vapi gives you about 7.5 seconds to respond before it drops the call. Need a caller’s full history? Grab the light stuff here, fetch the rest mid-call with a tool.

  • You’re not “creating an agent” each time. The agent is just data in your response. Vapi’s infrastructure is already running and shared, it doesn’t boot anything up for you, so there’s no cold start to worry about.

  • Working example: a full server (Node + Python) with the webhook, lookup, and mid-call tools, in one repo: github.com/usetuner/vapi-multiagent.

So Is This a Good Idea?

For multi-business voice AI, yes. Here’s the honest trade.

Pros

Cons

Your database is the one source of truth

You run and maintain your own server

No saved agents to create, sync, or clean up

You pay Vapi’s per-minute fee on top of your infra

Add or remove a business with one row

Calls aren’t grouped by an agent ID, so they’re harder to track

Scales cleanly to hundreds of businesses

Config travels in each request (keep secrets out of it)

Vapi still handles the hard realtime work


Our take: for anyone building voice agents for other businesses, transient is the right default. You keep full control of the part that’s actually your product, the prompts, tools, and logic, and let Vapi own the hard realtime engine. You rent the engine, you own the brain.

The One Catch: You Lose Sight of Your Calls

With transient, none of your calls are tied to a saved agent. They all pile up in one place, with no easy way to tell which call belonged to which business.

This is where Tuner comes in. Tuner is the analytics and testing layer for building reliable voice agents. It watches your calls for what’s working and what’s failing, hallucinations, broken flows, missed intents, and lets you test and simulate agents before they take a live call.


Tuner's overview for an agent, success and red-flag rates with call intent and outcome breakdowns

And on transient, we also pull the scattered calls back together, into one view split by customer, use case, or prompt version, so all of it applies even though Vapi never saved an agent.


Without Tuner

With Tuner

Calls grouped by customer & use case

Per-call monitoring with alerts & red flags

Analytics built around your use case

Test & simulate agents before going live

The Tuner-Vapi transient integration is how the calls get there.

The Takeaway

The whole thing in three lines:

  • Transient means your database holds the agent, not Vapi. You send the full setup on every call.

  • Use it for B2B (many businesses), or for deep per-customer customization.

  • Transient scatters your calls, Tuner brings them back. One view grouped by customer and use case, with monitoring, analytics, and testing on top.

Building on the transient flow? We’d like to hear how you handle visibility today. The integration docs are a single drop-in file if you want to try it.

Tuner helps voice AI teams test, monitor, and debug calls before issues reach users.