The situation is similar with Sppech-to-Text system (see sttd).
For keeping personal data in safe you can't use public services, and this solution can help with it.
The server was written in C, based on wstk, espeak, onnx, piper and its models.
Capable to work on the regular servers, produces fast responses that suitable to build realtime dialog systems.
This is a commercial product, if you are interested in purchasing or have some questions,
please visit a contact page.
There is an evaluation period with installation on your servers (preferred Ubuntu 22.04 x64).
Allows to keep your personal data in safe
There are open models for various languages
You don't need to purchase or rent some expensive hardware
There's a module for integration with Freeswitch (see mod_sivr_tts)
Allows to use speech-api from dialplan or scripts
Allows to save memory and improve performance by sharing models
This allows to integrate TTS service in various application (see example below)
Request:
curl -q http://127.0.0.1:8802/v1/speech -X POST -H "Authorization: Bearer secret" -H "Content-Type: application/json; charset=utf-8" -d '{"language":"en","samplerate":8000,"foramt":"mp3","input":"Hello, how can I help you?"}'
The response will be mp3 stream that you can save or payback.