How to integrate Voice Technologies (Speech-to-Text, Text-to-Speech)
The AI agent can also understand voice queries (Speech-to-Text or STT) and respond with voice (Text-to-Speech or TTS), for example, during phone calls or via a voice interface.
Flametree supports integration with any external voice models compatible with OpenAI-like APIs (for example, Azure Speech, Eleven Labs).
Speech-to-Text (STT) Integration
This integration allows the AI agent to transcribe user speech into text. It is used in voice channels (for example, SIP, Twilio) and in the Playground interface when the microphone is enabled.
Steps:
- Go to Settings > Connectivity.
- Open AI Models > Speech-to-text (STT).
- Select Add +.
- Enter the following information:
| Field | Description |
|---|---|
| Name, Description | Custom name and description of the integration |
| OpenAI-compatible API URL | Speech recognition server URL |
| Access Token | API access token from the provider |
| Type | Type of the integration. For example, whisper, local_whisper, 3i-vox |
| STT Model Name | STT model name. For local_whisper ,for example, whisper-large, whisper-medium, whisper-small |
- Click Save
Text-to-Speech (TTS) Integration
This integration allows the AI agent to speak responses out loud. It is used during phone calls or when voice playback is enabled in the Playground.
Steps:
- Go to Settings > Connectivity.
- Open AI Models > Text-to-speech (TTS).
- Select Add +.
- Enter the following information:
| Field | Description |
|---|---|
| Name, Description | Custom name and description of the integration |
| OpenAI-compatible API URL | URL of the speech synthesis server |
| Access Token | API access token from the provider |
| TTS Model Name | Voice model name (for example, Polly.Salli, en-US-JennyNeural) |
| Voice Provider | for example, azure, elevenlabs depending on the platform |
| Region (optional) | For Azure: eastus, westeurope, etc. |
- Click Save
Supported Platforms
Currently tested and supported:
- OpenAI Whisper
- Google Speech-to-Text (v2)
- Azure Cognitive Services (Speech)
- Eleven Labs (TTS)
Important Notes
- A single AI agent can use different voice models for STT and TTS.
- If you are using Twilio, there is no need to configure TTS/STT separately — Twilio uses its own built-in voice engines.
- For SIP integration, you must configure STT and TTS manually.