Speech-to-Text system

A key thing for voice interaction systems.
Now, there are many online services with similar functionality, but most of them lead to security issues (data leaks) and may cause delays in interaction (slow response).
Therefore, there is often a need to use something local.
This solution was written just for such cases.

Written in C, with libs: mpg123, alphacephei framework.
Capable to work on the regular servers, produces fast responses that suitable to build realtime dialogue systems.

Price: 350$ / 350 USDT
For purchase questions, please visit contact page.
A trial period with installation on your servers is provided (preferred Ubuntu 22.04 x64).

Basic features:

Neural based, full locally system
Doesn't depend on any online services, all data is processed locally
Multilingual support
There are open models for various languages
Fine-tuning models
There are tools for it
Context dictionary
Allows to defined (in request) a dictionary of accessible words
Speaker identify
Allows to generate vector for speaker identification
Capable to work on regular servers
for example: IBM x3550-M3
Supported in FreeSWITCH
Available in dialplan and scripts
There's a module for integration (mod_sivr_asr)
Preload and cache models
Allows to save memory and improve performance
Simple web api
Easy integration with various applications
Supports formats
- wav
- mp3
- l16
Supports os
- Linux

--- Examples ---

Example #1 (simple request)

Request:
curl http://127.0.0.1:8801/v1/transcriptions -X POST -H "Authorization: Bearer secret" -H "Content-Type: multipart/form-data" -F language="en" -F smodel="small" -F file="@test.mp3"

Response (json):
{
 "text" : "hello world"
 }

Example #2 (with speakes identify)

Request:
curl http://127.0.0.1:8801/v1/transcriptions -X POST -H "Authorization: Bearer secret" -H "Content-Type: multipart/form-data" -F language="en" -F smodel="small" -F vmodel="default" -F file="@test.mp3"

Response (json):
{
 "spk" : [-0.644623, 1.023342, 2.575434, 0.623447, -0.602342, 1.0234234 -1.4824234 -0.021242, 0.824297, -0.152424, ... ],
 "spk_frames" : 81,
 "text" : "hello world"
}