Build Your First Voice Agent

This guide explains how to create a Voice AI Agent on the Flametree platform. By the end, you will have a working agent that can handle calls, understand speech, and reply with a natural-sounding voice.

Your Voice AI agent will:

Make and receive phone calls via SIP
Follow a defined conversation flow (workflow)
Collect key details like the user’s name, intent, and callback time

Prerequisites

You need a configured SIP integration before creating a Voice AI agent. If you haven’t set it up yet, follow the SIP integration guide. This integration allows your agent to make and receive calls through the SIP protocol.

Configuration Overview

To create and configure a Voice AI Agent, you will complete these five steps:

Create a Voice Agent – start a new agent in the dashboard.
Configure Core Agent Settings, Workflow, and Models – define identity, task, speech style, and conversation flow.
Connect SIP Integration – choose LLM, TTS, and STT engines.
Set Max Open Sessions – link the agent to a voice channel.
Configure Environment Variables – fine-tune timing, recognition, and session behavior.

Step 1: Create a Voice Agent

Go to the AI Agents section in your Flametree dashboard.
Click Create new agent.
Select Voice Agent type:
- Inbound Voice Agent – answers incoming calls
- Outbound Voice Agent – makes calls and can also receive them

Tip: Outbound agents are useful for campaigns, surveys, and callback workflows. They can also handle inbound calls automatically.

Step 2: Configure Core Agent Settings, Workflow, and Models

Set up how your Voice Agent looks, speaks, and thinks. In this step, you’ll define its identity, conversation flow, and AI models.

Main parameters

Identity – the agent’s name or role
Speech Style and Language – how the agent talks
Task – the agent’s main purpose or goal

Workflow (conversation logic)

The Workflow defines your voice agent’s conversation steps. Update the description section to outline how the dialogue should progress. Learn more about Workflow configuration

Models

In the Models section, choose which models your agent will use to understand and respond during calls.

Category	Description	Example
LLM	Main language model that drives reasoning and response generation
`gpt-4.1`, `qwen2.5-instruct`, `qwen3-instruct`
Text-to-Speech (TTS)	Converts text into natural-sounding voice	Female Azure TTS
Speech-to-Text (STT)	Converts caller speech into text	Flametree Whisper

⚡ Note: The Voice Agent supports only instruct models (for example, gpt-4.1, qwen2.5-instruct, qwen3-instruct). The LLM speed affects response delay — choose the fastest model that meets your quality needs.

Step 3: Connect SIP Integration

Your Voice Agent uses SIP to make or receive calls. To connect an existing integration:

Go to the Communication Channels section.
Click the + (plus) button.
Select SIP.
Select the SIP integration you created earlier.
Click Save.

Step 4: Set Max Open Sessions

In Advanced Settings on the right panel, set Max Opened Sessions — this limits the number of simultaneous outbound calls.

Step 5: Configure Environment Variables

Use environment variables to fine-tune how your Voice Agent behaves during calls. These parameters control timing, recognition quality, session behavior, and call flow.

Core Parameters

These variables define how the Voice Agent operates and interacts with callers.

Variable	Description	Recommended value	Required / Recommended
`CODECS_PRIORITY`	List or dictionary defining the preferred audio codecs for SIP calls. Leave `{}` for automatic negotiation.	`{}`	Required
`SESSION_TIMEOUT_SEC`	Time (in seconds) before a session closes after the last message. Should not be shorter than the longest speech segment in the dialogue.	`120`	Required
`WHISPER_LANGUAGE`	Language code for Whisper STT (use if you know the expected caller language for better recognition).	`en`	Recommended
`USER_SALIENCE_TIMEOUT_MS`	Time (in milliseconds) before session closes after the last human message. Should not be shorter than the longest AI speech in the dialog.	`100000`	Recommended
`INTERRUPTIONS_ARE_ALLOWED`	Allow users to interrupt AI speech.	`False`	Recommended
`START_PHRASE`	Agent’s opening phrase. Keeps greetings consistent.	`"Hello, this is Anna. How can I assist you today?"`	Recommended

Voice Detection and Timing

These variables adjust how the Voice Activity Detection (VAD) system detects speech and pauses.

Variable	Description	Default	Required / Recommended
`VAD_THRESHOLD`	Speech sensitivity. Higher = less sensitive.	`0.65`	Recommended
`VAD_SPEECH_PROB_WINDOW`	Window size for calculating speech probability.	`3`	Recommended
`VAD_MIN_SPEECH_DURATION_MS`	Minimum speech duration to register as valid.	`250`	Recommended
`VAD_MIN_SILENCE_DURATION_MS`	Minimum silence duration to register as pause.	`350`	Recommended
`VAD_SPEECH_PAD_MS`	Additional buffer (in milliseconds) before and after speech segments.	`700`	Recommended
`LONG_PAUSE_OFFSET_MS`	Defines a long pause (used to detect intent to end or wait).	`850`	Recommended
`SHORT_PAUSE_OFFSET_MS`	Defines short pause (used for natural conversation pacing).	`200`	Recommended
`VAD_CORRECTION_ENTER_THRESHOLD`	Threshold for entering speech detection mode.	`0.6`	Recommended
`VAD_CORRECTION_EXIT_THRESHOLD`	Threshold for exiting speech detection mode.	`0.35`	Recommended

SIP and Logging Parameters

These settings control SIP signaling and logging verbosity.

Variable	Description	Default	Required / Recommended
`SIP_EARLY_EOC`	Enables early end-of-call signaling in SIP. Usually disabled.	`false`	Recommended
`PJSIP_LOG_LEVEL`	SIP log verbosity. Higher value = more detailed logs.	`4`	Recommended

Recommended Setup

At minimum, set the following:

CODECS_PRIORITY
SESSION_TIMEOUT_SEC

It is strongly recommended to also set:

WHISPER_LANGUAGE
USER_SALIENCE_TIMEOUT_MS
INTERRUPTIONS_ARE_ALLOWED
START_PHRASE
SESSION_VERSION

All other parameters can stay at default values unless you need fine-tuning.

Your Voice Agent is now ready to handle calls 🚀

Prerequisites​

Configuration Overview​

Step 1: Create a Voice Agent​

Step 2: Configure Core Agent Settings, Workflow, and Models​

Main parameters​

Workflow (conversation logic)​

Models​

Step 3: Connect SIP Integration​

Step 4: Set Max Open Sessions​

Step 5: Configure Environment Variables​

Core Parameters​

Voice Detection and Timing​

SIP and Logging Parameters​

Recommended Setup​