Skip to main content

Build Your First Voice Agent

This guide explains how to create a Voice AI Agent on the Flametree platform. By the end, you will have a working agent that can handle calls, understand speech, and reply with a natural-sounding voice.

Your Voice AI agent will:

  • Make and receive phone calls via SIP
  • Follow a defined conversation flow (workflow)
  • Collect key details like the user’s name, intent, and callback time

Prerequisites

You need a configured SIP integration before creating a Voice AI agent. If you haven’t set it up yet, follow the SIP integration guide. This integration allows your agent to make and receive calls through the SIP protocol.

Configuration Overview

To create and configure a Voice AI Agent, you will complete these five steps:

  1. Create a Voice Agent – start a new agent in the dashboard.
  2. Configure Core Agent Settings, Workflow, and Models – define identity, task, speech style, and conversation flow.
  3. Connect SIP Integration – choose LLM, TTS, and STT engines.
  4. Set Max Open Sessions – link the agent to a voice channel.
  5. Configure Environment Variables – fine-tune timing, recognition, and session behavior.

Step 1: Create a Voice Agent

  1. Go to the AI Agents section in your Flametree dashboard.
  2. Click Create new agent.
  3. Select Voice Agent type:
    • Inbound Voice Agent – answers incoming calls
    • Outbound Voice Agent – makes calls and can also receive them

Tip: Outbound agents are useful for campaigns, surveys, and callback workflows. They can also handle inbound calls automatically.

Step 2: Configure Core Agent Settings, Workflow, and Models

Set up how your Voice Agent looks, speaks, and thinks. In this step, you’ll define its identity, conversation flow, and AI models.

Main parameters

  • Identity – the agent’s name or role
  • Speech Style and Language – how the agent talks
  • Task – the agent’s main purpose or goal

Workflow (conversation logic)

The Workflow defines your voice agent’s conversation steps. Update the description section to outline how the dialogue should progress. Learn more about Workflow configuration

Models

In the Models section, choose which models your agent will use to understand and respond during calls.

CategoryDescriptionExample
LLMMain language model that drives reasoning and response generation
gpt-4.1, qwen2.5-instruct, qwen3-instruct
Text-to-Speech (TTS)Converts text into natural-sounding voiceFemale Azure TTS
Speech-to-Text (STT)Converts caller speech into textFlametree Whisper

Note: The Voice Agent supports only instruct models (for example, gpt-4.1, qwen2.5-instruct, qwen3-instruct). The LLM speed affects response delay — choose the fastest model that meets your quality needs.

Step 3: Connect SIP Integration

Your Voice Agent uses SIP to make or receive calls. To connect an existing integration:

  1. Go to the Communication Channels section.
  2. Click the + (plus) button.
  3. Select SIP.
  4. Select the SIP integration you created earlier.
  5. Click Save.

Step 4: Set Max Open Sessions

In Advanced Settings on the right panel, set Max Opened Sessions — this limits the number of simultaneous outbound calls.

Step 5: Configure Environment Variables

Use environment variables to fine-tune how your Voice Agent behaves during calls. These parameters control timing, recognition quality, session behavior, and call flow.

Core Parameters

These variables define how the Voice Agent operates and interacts with callers.

VariableDescriptionRecommended valueRequired / Recommended
CODECS_PRIORITYList or dictionary defining the preferred audio codecs for SIP calls. Leave {} for automatic negotiation.{}Required
SESSION_TIMEOUT_SECTime (in seconds) before a session closes after the last message. Should not be shorter than the longest speech segment in the dialogue.120Required
WHISPER_LANGUAGELanguage code for Whisper STT (use if you know the expected caller language for better recognition).enRecommended
USER_SALIENCE_TIMEOUT_MSTime (in milliseconds) before session closes after the last human message. Should not be shorter than the longest AI speech in the dialog.100000Recommended
INTERRUPTIONS_ARE_ALLOWEDAllow users to interrupt AI speech.FalseRecommended
START_PHRASEAgent’s opening phrase. Keeps greetings consistent."Hello, this is Anna. How can I assist you today?"Recommended

Voice Detection and Timing

These variables adjust how the Voice Activity Detection (VAD) system detects speech and pauses.

VariableDescriptionDefaultRequired / Recommended
VAD_THRESHOLDSpeech sensitivity. Higher = less sensitive.0.65Recommended
VAD_SPEECH_PROB_WINDOWWindow size for calculating speech probability.3Recommended
VAD_MIN_SPEECH_DURATION_MSMinimum speech duration to register as valid.250Recommended
VAD_MIN_SILENCE_DURATION_MSMinimum silence duration to register as pause.350Recommended
VAD_SPEECH_PAD_MSAdditional buffer (in milliseconds) before and after speech segments.700Recommended
LONG_PAUSE_OFFSET_MSDefines a long pause (used to detect intent to end or wait).850Recommended
SHORT_PAUSE_OFFSET_MSDefines short pause (used for natural conversation pacing).200Recommended
VAD_CORRECTION_ENTER_THRESHOLDThreshold for entering speech detection mode.0.6Recommended
VAD_CORRECTION_EXIT_THRESHOLDThreshold for exiting speech detection mode.0.35Recommended

SIP and Logging Parameters

These settings control SIP signaling and logging verbosity.

VariableDescriptionDefaultRequired / Recommended
SIP_EARLY_EOCEnables early end-of-call signaling in SIP. Usually disabled.falseRecommended
PJSIP_LOG_LEVELSIP log verbosity. Higher value = more detailed logs.4Recommended

At minimum, set the following:

  • CODECS_PRIORITY
  • SESSION_TIMEOUT_SEC

It is strongly recommended to also set:

  • WHISPER_LANGUAGE
  • USER_SALIENCE_TIMEOUT_MS
  • INTERRUPTIONS_ARE_ALLOWED
  • START_PHRASE
  • SESSION_VERSION

All other parameters can stay at default values unless you need fine-tuning.

Your Voice Agent is now ready to handle calls 🚀