Skip to main content

Voice Agent with SIP or Twilio

In this tutorial you build a voice agent in Flametree that answers inbound phone calls and holds a spoken conversation. You can connect the phone line through your own telephony with the SIP channel, or through a Twilio number — the tutorial covers both as alternative options in one path.

The example continues with EventHub, the fictional online-events company from the inbound tutorial — this time callers phone an EventHub support number and talk to the agent. By the end you have:

  • A voice-capable agent with the speech setup its channel needs.
  • A connected voice channel — SIP or Twilio — attached to the agent with Inbound on.
  • A tested inbound call, answered by the agent.
  • The call reviewed in Sessions as a transcript, with the recording downloaded for SIP.

Plan about an hour in the portal, plus provider-side setup time for the SIP account or the Twilio number. Each stage gives only the steps you need and links to the page that covers that screen in full.

SIP or Twilio — pick one

The two channels reach the same result through different infrastructure. The biggest difference: the SIP channel relies on speech model connections you set up in Flametree, while the Twilio channel uses Twilio's built-in speech and needs none. Stage 2 explains the split, then Stage 3 forks into Option A: SIP and Option B: Twilio.

Before you start

  • You are signed in to the portal with a role that can edit agents and settings. If a button in these stages is disabled, ask your administrator.
  • An LLM connection exists in Settings > Connectivity — without a selected LLM the agent cannot run.
  • For SIP: a SIP account the platform can register as, with a phone number routed to it. See SIP.
  • For Twilio: a Twilio account, a voice-capable Twilio number, and the Account SID and Auth Token from the Twilio Console. See Twilio.

Stage 1: Prepare the voice agent

A voice agent is a regular agent that you configure in Advanced mode, because the speech models live there.

  1. Go to AI Agents > Agents, click Create new agent, and select Inbound Agent — the type for conversations where customers reach out first. Enter EventHub Voice in the Agent name field and click Create.
  2. On the agent page, switch the Advanced mode toggle in the upper-right corner on.
  3. Write the prompt sections — Identity, Speech style and language, and Task — for a phone assistant that helps callers with EventHub events. State the languages the agent should speak explicitly.
  4. Set a greeting so the caller hears something when the call connects: in the Advanced card, add the START_PHRASE environment variable with your opening line, for example Hello, you have reached EventHub. How can I help you today?

The speech models and the call-limit setting come next. Leave the agent stopped for now — you start it after the channel is attached.

The full page, the Models card, and Max opened sessions (which limits how many calls the agent handles at the same time): Advanced mode.

Stage 2: Set up speech, where the channel needs it

This is the decision that shapes the rest of the tutorial.

  • SIP brings the call; the agent does the speech. It needs a Speech-To-Text connection to transcribe the caller and a Text-To-Speech connection to read replies aloud.
  • Twilio provides the speech itself. You do not create speech connections for it.

If you chose Twilio, skip this stage and go to Stage 3, Option B.

If you chose SIP, create both connections now:

  1. Go to Settings > Connectivity.
  2. In the left panel, open AI Models, select Speech-to-text (STT), click Add, fill in the form, and click Save. Wait for the green status dot.
  3. Open AI Models > Text-to-speech (TTS), click Add, fill in the form, and click Save. Text-to-speech connections are not health-checked, so this card shows no status dot — you confirm it works with the test call in Stage 4.
  4. Back on the agent page from Stage 1, scroll to the Models card in the Settings panel, select your connections in the Speech-To-Text and Text-To-Speech rows, and click Save.

The exact fields for each speech connection: Settings > Connectivity. Attaching models to the agent: Advanced mode.

Stage 3, Option A: Connect the SIP channel

Channel connections are created once in Settings > Channels, then attached to the agent. A SIP connection has no Start/Stop buttons of its own — it starts and stops together with the agent it is attached to.

  1. Go to Settings > Channels, select SIP in the channel list, and click Add.
  2. In the New connector form, enter a Name and the credentials from your provider: Domain (the SIP server, for example sip.example.com), User, and Password. Fill in Login only if your provider uses an authentication login that differs from User, and Caller ID for the number shown on outbound calls.
  3. Click Save.
  4. Open the agent from Stage 1, find the Communication channels card, and click Add. Pick SIP and tick the checkbox next to your connection.
  5. Switch Inbound on so the agent answers incoming calls.
  6. Click Save, then restart the agent: Stop agent, wait for Stopped, then Start agent.

When the agent starts, the channel registers with your SIP server using the connection's credentials. Registration errors appear in the agent's logs (three-dot menu > Show Logs). Only one agent answers inbound calls per SIP connection. Now go to Stage 4.

Credential fields, codecs, and registration details: SIP.

Stage 3, Option B: Connect the Twilio channel

Channel connections are created once in Settings > Channels, then attached to the agent. A Twilio connection has no Start/Stop buttons of its own — it starts and stops together with the agent it is attached to.

  1. Go to Settings > Channels, select Twilio in the channel list, and click Add.
  2. In the New connector form, enter a Name, your Account SID and Auth Token from the Twilio Console dashboard, and the Phone number in E.164 format (for example, +14155550123). Leave Voice model, Speech model, Region, and Edge empty to use Twilio's defaults.
  3. Click Save.
  4. Open the agent from Stage 1, find the Communication channels card, and click Add. Pick Twilio and tick the checkbox next to your connection.
  5. Switch Inbound on so the agent answers incoming calls.
  6. Click Save, then restart the agent: Stop agent, wait for Stopped, then Start agent.

When the agent starts, the channel authenticates with Twilio, finds the number, and points its voice webhook at Flametree — there is nothing to configure in the Twilio Console. Keep the number dedicated to this channel. Only one agent answers inbound calls per Twilio connection.

The channel takes over the number

Registering the webhook replaces whatever voice configuration the Twilio number had before. If something else reconfigures the number in the Twilio Console, calls stop reaching the agent until you restart it.

Credential fields, the model and region options, and webhook behavior: Twilio.

Stage 4: Test a call

  1. Check that the agent is running. For SIP, look for a successful registration message in the agent's logs (three-dot menu > Show Logs).
  2. From any phone, call the number — the one routed to your SIP account, or your Twilio number.
  3. The agent answers, opens with the greeting from Stage 1, and replies according to its instructions. The agent detects when you start and stop speaking and replies after a natural pause.
  4. Speak over the agent mid-sentence — it stops and listens, then processes the new input. On a SIP call, if you stay silent after the agent finishes speaking — 60 seconds by default — the channel ends the call.
  5. Hang up. The session closes.

If the agent answers but stays silent, a SIP agent is missing its speech models — recheck the Speech-To-Text and Text-To-Speech rows in the Models card and restart the agent. This is also where you confirm that the Text-To-Speech connection, which has no status dot in Connectivity, actually works.

The provider-specific test checklists: SIP and Twilio.

Stage 5: Review the call in Sessions

Every call becomes a session, the same as a chat conversation.

  1. In the left menu, open Sessions.
  2. Click Search, pick your agent in the Agents filter of the Session Filters dialog, and click Apply. You can also filter Channel by Sip (voice) or Twilio.
  3. Open the call: the conversation appears in the Chat panel as a transcript, message by message. It can take a moment to process — if the chat looks empty, refresh after a moment.
  4. Check what the agent collected in the Session results block of the Parameters & results panel, and open Logs under the chat to see how the call was handled.
  5. For a SIP call, download the recording: under the chat, next to Logs, click the button that reads Download sip audio. The file is saved as a WAV named with the session ID; the portal has no built-in player, so play it in any audio player.

Filters, the transcript, and the recording download in full: Sessions.

Optional: Place outbound calls at scale

The same voice agent can call customers, not only answer them — for example, to remind EventHub registrants about an upcoming event.

  1. Attach the channel to the agent with Inbound off if the agent should only place calls; with Inbound on, it can do both.
  2. Build a campaign flow with a Voice Communication action, which places an outbound call through a SIP voice agent. In the action, choose the Agent, pick the Phone field that holds each participant's number, enter the SIP Server to call through (with optional SIP Parameters), and set the schedule and Communication Duration for when the call goes out and how long it stays open.
  3. Activate the flow and start the campaign, then upload a participant list with a phone number per contact — follow the start-then-upload order from the Outbound WhatsApp Campaign tutorial, because the participant trigger fires only on upload while the campaign is Active. Each call appears in Sessions like the inbound test above.

The Voice Communication step, participant fields, and the upload flow: Flows. The order-of-operations rules for campaigns are covered in the Outbound WhatsApp Campaign tutorial.

What you built

  • A voice agent that answers calls to your EventHub number and speaks with callers.
  • The speech setup the channel needs: Speech-To-Text and Text-To-Speech connections for SIP, or Twilio's built-in speech for Twilio.
  • A voice channel attached with Inbound on, and a tested call.
  • The call in Sessions as a transcript — with the recording downloaded, for SIP.

Common issues

SymptomLikely causeFix
Calls never arrive (SIP)SIP registration failedOpen the agent's logs (three-dot menu > Show Logs) and check for registration errors — a wrong User, Login, Password, or Domain — and confirm the number is routed to this SIP account
Calls do not reach the agent (Twilio)Webhook registration failed at startupThe Account SID or Auth Token is wrong, or the Phone number does not match a number in the account — fix the connection, Save, and restart the agent
The agent answers but stays silent (SIP)Speech models missingSelect Speech-To-Text and Text-To-Speech in the Models card, then restart the agent
The call connects but the agent does not answerChannel not attached, Inbound off, or no restartAttach the connection, switch Inbound on, Save, and restart the agent
You cannot enable Inbound on a second agentOne agent answers inbound calls per connectionCreate a separate SIP account or Twilio number with its own connection for the other agent
The chat looks empty for the callTranscript still processingWait a moment and refresh the session
The recording download button is missingOnly voice sessions with a stored recording show itRecording download is available for SIP calls — open a SIP voice session

Was this article helpful?