Skip to main content

Quick Start: Build Your First Voice Agent

This step-by-step guide walks you through creating a Voice AI Agent on the Flametree platform.
By the end of this tutorial, you’ll have a fully functional agent capable of handling inbound or outbound calls, understanding speech, and responding naturally with synthesized voice.


What You’ll Build

Your Voice AI agent will be able to:

  • Answer or initiate phone calls via SIP
  • Follow a defined conversation flow (workflow)
  • Collect key data such as customer name, intent, and callback time

Prerequisites


Step 1: Create a Voice Agent

  1. Go to the AI Agents section in your Flametree dashboard.
  2. Click Create Agent.
  3. Choose Voice Agent type:
    • Inbound Voice Agent – answers incoming calls
    • Outbound Voice Agent – initiates calls to users and can also handle inbound calls

Tip: Outbound agents are often used in campaigns, surveys, or callback flows, but can also handle inbound calls automatically.


Step 2: Define Main Agent Parameters

Set the core configuration parameters of your Voice Agent.

Identity
Speech Style and Language
The Task describes the agent’s high-level purpose and goals.


Step 3: Configure the Workflow (Conversation Logic)

The Workflow defines your voice agent’s conversation steps.
You can edit the description section in the workflow to specify how the conversation should proceed.


Step 4: Select Models

In the Models section, select models for understanding and speaking.

CategoryDescriptionExample
LLMMain language modelgpt-4.1, qwen2.5-instruct, qwen3-instruct
Text-to-Speech (TTS)Converts text to voice outputFemale Azure TTS
Speech-to-Text (STT)Converts caller speech into textFlametree Whisper

Note: The Voice Agent supports only instruct models (e.g., gpt-4.1, qwen2.5-instruct, qwen3-instruct).
The speed of the LLM directly affects response delay — choose the fastest model that meets your quality requirements.


Step 5: Connect SIP Integration

Your Voice Agent uses SIP to send or receive calls.

Configuration Steps

  1. Go to the Communication Channels section.
  2. Click the + (plus) button.
  3. Select SIP.
  4. Choose the SIP integration you prepared earlier.
  5. Click Save.

Step 6: Set Max Opened Sessions

On the Advanced Settings section on the right panel, set Max Opened Sessions — this limits the number of simultaneous outbound calls.


Agent Customization

Configure additional environment variables to fine-tune how your Voice Agent behaves during live calls.
These parameters control timing, recognition quality, session behavior, and call flow.

VariableDescriptionRecommended value
CODECS_PRIORITYList or dictionary defining the preferred audio codecs for SIP calls. Leave {} for automatic negotiation.{}
SESSION_TIMEOUT_SECTime (in seconds) before a session is closed after the last message. Should not be shorter than the longest speech segment in the dialogue.120
WHISPER_LANGUAGELanguage code for Whisper STT (use if you know the expected caller language for better recognition).en
USER_SALIENCE_TIMEOUT_MSTime (in milliseconds) before session closes after the last human message. Should not be shorter than the longest AI speech in the dialog.100000
INTERRUPTIONS_ARE_ALLOWEDAllow the human speaker to interrupt AI speech.False
START_PHRASEFixed opening phrase for the agent. Helps control token usage and ensures consistent greetings."Hello, this is Anna. How can I assist you today?"

Voice Detection and Timing Parameters

These settings control the Voice Activity Detection (VAD) system — when the agent detects pauses, starts, or ends speech.

VariableDescriptionDefault
VAD_THRESHOLDSensitivity threshold for detecting speech. Higher = less sensitive.0.65
VAD_SPEECH_PROB_WINDOWWindow size for calculating speech probability.3
VAD_MIN_SPEECH_DURATION_MSMinimum speech duration to register as valid speech.250
VAD_MIN_SILENCE_DURATION_MSMinimum silence duration to register as pause.350
VAD_SPEECH_PAD_MSAdditional buffer (in ms) before and after speech segments.700
LONG_PAUSE_OFFSET_MSDefines a long pause threshold (used to detect intent to end or wait).850
SHORT_PAUSE_OFFSET_MSDefines short pause timing (used for natural conversation pacing).200
VAD_CORRECTION_ENTER_THRESHOLDThreshold for entering speech detection mode.0.6
VAD_CORRECTION_EXIT_THRESHOLDThreshold for exiting speech detection mode.0.35

SIP and Logging Parameters

VariableDescriptionDefault
SIP_EARLY_EOCEnables early end-of-call signaling in SIP. Usually left disabled.false
PJSIP_LOG_LEVELVerbosity level for SIP library logs. Higher value = more detailed logs.4

At minimum, you must set:

  • CODECS_PRIORITY
  • SESSION_TIMEOUT_SEC

It is strongly recommended to also set:

  • WHISPER_LANGUAGE
  • USER_SALIENCE_TIMEOUT_MS
  • INTERRUPTIONS_ARE_ALLOWED
  • START_PHRASE
  • SESSION_VERSION

All other parameters can remain at default values unless advanced tuning is required.