This's a software kit enables to build fully autonomous, privacy intelligent IVR services based on Freeswitch.
The kit consists of various modules / libraries, usually supplied as a boxed-solution (a ready-made server or can be installed on your virtual machine), there are some modules that available separately.
solving typical user problems in dialogue mode, fully automatically
transcription (classification) and subsequent transfer to some your ticket system
making calls with reminder/confirmations and so on
receiving, filtering, forwarding calls and so on
No. This thing going by another way, in contrast of LLM (where models consist all things as possible), it uses lightweight NLP models trained specifically for your tasks, with the logic in javascript,
which allows to have fully autonomous systems (running on your side) with maximum performance on a quite cheap equipment.
Local TTS/STT services ensure the safety of users personal data and biometrics.
For example: Core2 Quad with 4G memory will be enough to handle 2-5 simultaneous dialogues.
Of course, if your security policy allows to use external GPT services (like ChatGPT/DeepSeek/...) or you have the resources to run them locally, then they can be integrated into the system without any problems.
Unfortunately, LLM is still a fairly resource-intensive solution and not everyone can afford to have it in production, as well as using public GPT services, in such systems it might lead to the following troubles:
Initially, LLM models don't know anything about your business, therefore, for normal operation, you have to retrain the models on your data (which may contain some confidential information), onsequently, all this becomes available to the third parties (service owners and maybe someone else)
As well as the item above, here, we get the same but with user data, during the conversation with the system some of the user personal information may be required (contract/ID number, PIN code and so on) and all this also becomes available to the third parties
This possible problem might be more dangerous than the whole previous ones put together.
So, when the sevice uses any external Speech-to-Text service, nothing can prevents it from collecting and extracting voice fingerprints from audio fragments (unless special means are used to avoid this).
Then this can be used to identify users by voice (without their knowledge) or in voice synthesizers to create fakes...
As it mentioned before, LLM is quite expensive and resource-intensive solution, even if it has a low tokent price for you, the final request on the provider's hardware costs several times more (and it's only a matter of time when the token price suddenly changed for you).
The responsiveness of the system (delay in generating a response) also plays an important role, on basic tariffs you might easily get a delay around 20+ seconds, which brings discomfort to dialogues and irritation to users.
For asrd/ttsd the languages list is quite large, because they use opensouce models. (see the components description list)
NLP supported the following languages: russian, english (can be expanded if necessary)
High-performance speech-to-text service (pure ั)
Capabilities: offline-transcription (local models, retraining), online-transcription (woking with external services), multilingual mode.
Languages: english, german, french, russian, full list
API: JSON-RPC, simple HTTP
High-performance text-to-speech service (pure ั)
Capabilities: offline-synthesis (local models, retraining), online-synthesis (woking with external services), multilingual mode.
Languages: english, german, french, russian, full list
API: JSON-RPC, simple HTTP
High-performance language processing service (java)
Capabilities: working with small datatset, fast retraining, multilingual mode.
Languages: english, russian (core models)
API: WebUI + JSON-RPC
WebUI for managing the components (java + javascript)
Class 5 softswith + sivr modules