Quick Start: Build Your First Voice Agent

This step-by-step guide walks you through creating a Voice AI Agent on the Flametree platform.
By the end of this tutorial, you’ll have a fully functional agent capable of handling inbound or outbound calls, understanding speech, and responding naturally with synthesized voice.

What You’ll Build

Your Voice AI agent will be able to:

Answer or initiate phone calls via SIP
Follow a defined conversation flow (workflow)
Collect key data such as customer name, intent, and callback time

Prerequisites

SIP integration prepared and configured
(If not ready, create one first)

Step 1: Create a Voice Agent

Go to the AI Agents section in your Flametree dashboard.
Click Create Agent.
Choose Voice Agent type:
- Inbound Voice Agent – answers incoming calls
- Outbound Voice Agent – initiates calls to users and can also handle inbound calls

Tip: Outbound agents are often used in campaigns, surveys, or callback flows, but can also handle inbound calls automatically.

Step 2: Define Main Agent Parameters

Set the core configuration parameters of your Voice Agent.

Identity
Speech Style and Language
The Task describes the agent’s high-level purpose and goals.

Step 3: Configure the Workflow (Conversation Logic)

The Workflow defines your voice agent’s conversation steps.
You can edit the description section in the workflow to specify how the conversation should proceed.

Step 4: Select Models

In the Models section, select models for understanding and speaking.

Category	Description	Example
LLM	Main language model	`gpt-4.1`, `qwen2.5-instruct`, `qwen3-instruct`
Text-to-Speech (TTS)	Converts text to voice output	Female Azure TTS
Speech-to-Text (STT)	Converts caller speech into text	Flametree Whisper

⚡ Note: The Voice Agent supports only instruct models (e.g., gpt-4.1, qwen2.5-instruct, qwen3-instruct).
The speed of the LLM directly affects response delay — choose the fastest model that meets your quality requirements.

Step 5: Connect SIP Integration

Your Voice Agent uses SIP to send or receive calls.

Configuration Steps

Go to the Communication Channels section.
Click the + (plus) button.
Select SIP.
Choose the SIP integration you prepared earlier.
Click Save.

Step 6: Set Max Opened Sessions

On the Advanced Settings section on the right panel, set Max Opened Sessions — this limits the number of simultaneous outbound calls.

Agent Customization

Configure additional environment variables to fine-tune how your Voice Agent behaves during live calls.
These parameters control timing, recognition quality, session behavior, and call flow.

Variable	Description	Recommended value
`CODECS_PRIORITY`	List or dictionary defining the preferred audio codecs for SIP calls. Leave `{}` for automatic negotiation.	`{}`
`SESSION_TIMEOUT_SEC`	Time (in seconds) before a session is closed after the last message. Should not be shorter than the longest speech segment in the dialogue.	`120`
`WHISPER_LANGUAGE`	Language code for Whisper STT (use if you know the expected caller language for better recognition).	`en`
`USER_SALIENCE_TIMEOUT_MS`	Time (in milliseconds) before session closes after the last human message. Should not be shorter than the longest AI speech in the dialog.	`100000`
`INTERRUPTIONS_ARE_ALLOWED`	Allow the human speaker to interrupt AI speech.	`False`
`START_PHRASE`	Fixed opening phrase for the agent. Helps control token usage and ensures consistent greetings.	`"Hello, this is Anna. How can I assist you today?"`

Voice Detection and Timing Parameters

These settings control the Voice Activity Detection (VAD) system — when the agent detects pauses, starts, or ends speech.

Variable	Description	Default
`VAD_THRESHOLD`	Sensitivity threshold for detecting speech. Higher = less sensitive.	`0.65`
`VAD_SPEECH_PROB_WINDOW`	Window size for calculating speech probability.	`3`
`VAD_MIN_SPEECH_DURATION_MS`	Minimum speech duration to register as valid speech.	`250`
`VAD_MIN_SILENCE_DURATION_MS`	Minimum silence duration to register as pause.	`350`
`VAD_SPEECH_PAD_MS`	Additional buffer (in ms) before and after speech segments.	`700`
`LONG_PAUSE_OFFSET_MS`	Defines a long pause threshold (used to detect intent to end or wait).	`850`
`SHORT_PAUSE_OFFSET_MS`	Defines short pause timing (used for natural conversation pacing).	`200`
`VAD_CORRECTION_ENTER_THRESHOLD`	Threshold for entering speech detection mode.	`0.6`
`VAD_CORRECTION_EXIT_THRESHOLD`	Threshold for exiting speech detection mode.	`0.35`

SIP and Logging Parameters

Variable	Description	Default
`SIP_EARLY_EOC`	Enables early end-of-call signaling in SIP. Usually left disabled.	`false`
`PJSIP_LOG_LEVEL`	Verbosity level for SIP library logs. Higher value = more detailed logs.	`4`

Recommended Configuration Summary

At minimum, you must set:

CODECS_PRIORITY
SESSION_TIMEOUT_SEC

It is strongly recommended to also set:

WHISPER_LANGUAGE
USER_SALIENCE_TIMEOUT_MS
INTERRUPTIONS_ARE_ALLOWED
START_PHRASE
SESSION_VERSION

All other parameters can remain at default values unless advanced tuning is required.

What You’ll Build​

Prerequisites​

Step 1: Create a Voice Agent​

Step 2: Define Main Agent Parameters​

Step 3: Configure the Workflow (Conversation Logic)​

Step 4: Select Models​

Step 5: Connect SIP Integration​

Configuration Steps​

Step 6: Set Max Opened Sessions​

Agent Customization​

Voice Detection and Timing Parameters​

SIP and Logging Parameters​

Recommended Configuration Summary​