How to integrate Voice Technologies (Speech-to-Text, Text-to-Speech)

The AI agent can also understand voice queries (Speech-to-Text or STT) and respond with voice (Text-to-Speech or TTS), for example, during phone calls or via a voice interface.

Flametree supports integration with any external voice models compatible with OpenAI-like APIs (e.g., Azure Speech, Eleven Labs).

Speech-to-Text (STT) Integration

This integration allows the AI agent to transcribe user speech into text. It is used in voice channels (e.g., SIP, Twilio) and in the Playground interface when the microphone is enabled.

Steps:

Go to the Integrations section
Find the Speech-to-text (STT) integration type and click Add +
Fill in the following fields:

Field	Description
Name, Description	Custom name and description of the integration
OpenAI-compatible API URL	Speech recognition server URL
Access Token	API access token from the provider
Type	Type of the integration e.g. `whisper`, `local_whisper`, `3i-vox`
STT Model Name	STT model name. For `local_whisper` e.g. `whisper-large`, `whisper-medium`, `whisper-small`

Click Save

Text-to-Speech (TTS) Integration

This integration allows the AI agent to speak responses out loud. It is used during phone calls or when voice playback is enabled in the Playground.

Steps:

Go to the Integrations section
Find the Text-to-speech (TTS) integration type and click Add +
Fill in the following fields:

Field	Description
Name, Description	Custom name and description of the integration
OpenAI-compatible API URL	URL of the speech synthesis server
Access Token	API access token from the provider
TTS Model Name	Voice model name (e.g., Polly.Salli, en-US-JennyNeural)
Voice Provider	e.g., `azure`, `elevenlabs` depending on the platform
Region (optional)	For Azure: `eastus`, `westeurope`, etc.

Click Save

Supported Platforms

Currently tested and supported:

OpenAI Whisper
Google Speech-to-Text (v2)
Azure Cognitive Services (Speech)
Eleven Labs (TTS)

Important Notes

A single AI agent can use different voice models for STT and TTS.
If you are using Twilio, there is no need to configure TTS/STT separately — Twilio uses its own built-in voice engines.
For SIP integration, you must configure STT and TTS manually.

Speech-to-Text (STT) Integration​

Text-to-Speech (TTS) Integration​

Supported Platforms​

Important Notes​

Speech-to-Text (STT) Integration

Text-to-Speech (TTS) Integration

Supported Platforms

Important Notes