Text-to-Speech system

Similar to the speech-to-text system (see sttd).
This thing was written for cases when there is a need to have everything locally and so that it doesn’t slow down.

Written in C, with libs: lame, speex DSP, espeak, onnx, piper.
Capable to work on the regular servers, produces fast responses that suitable to build realtime dialogue systems.

Price: 350$ / 350 USDT
For purchase questions, please visit contact page.
A trial period with installation on your servers is provided (preferred Ubuntu 22.04 x64).

Basic feature:

Neural based, full locally system
Doesn't depend on any online services, all data is processed locally
Multilingual support
There are open models for various languages
Capable to work on regular servers
You don't need to purchase or rent some expensive hardware
Supported in FreeSWITCH
Available in dialplan and scripts
There's a module for integration (mod_sivr_tts)
Preload and cache models
Allows to save memory and improve performance
Simple web api
Easy integration with various applications
Supports formats
- wav
- mp3
Supports os
- Linux

--- Examples ---

Request:
curl -q http://127.0.0.1:8802/v1/speech -X POST -H "Authorization: Bearer secret" -H "Content-Type: application/json; charset=utf-8" -d '{"language":"en","samplerate":8000,"foramt":"mp3","input":"Hello, how can I help you?"}'

The response will be as an mp3 stream that you can save or payback.