Innovative Speech-To-Text Answering Machine
Speech-To-Text Auto-responder allows a subscriber that uses mobile operator’s “Voice mail” service to receive voice messages from other subscribers in the form of an audio file (mp3), as well as in text, converted from an audio recording of the message. Moreover, message can be redirected to other convenient channels, for instance, instant messengers ( Telegram), SMS, as well as to the subscriber’s self-service mobile application (in particular, “Subscriber Assistant”).
OLSOFT, having extensive experience working with mobile operators and being well aware of the specifics and issues that occur in their activities, offers an innovative Speech-To-Text Answering Machine solution based on speech recognition technologies in partnership with the STC Group of Companies (Russian Federation) to improve loyalty of subscribers and expansion of their use of the “Voice Mail” service.
Components of the Solution
The Speech-to-Text Auto-Responder is a client-server solution. The server hosts the Database, which stores all the necessary information that makes system operate, a Web application that allows client applications to receive the necessary information, a voice message switch, a set of APIs for the switch and internal and external systems.
The client part includes mobile self-service applications for subscribers. There is also a possibility to use instant messengers such as Telegram as a client application.
- Speech recognition service
- Channel Distribution Service
- FreeSwitch API
- API for the "Mobile Assistant" system
- Database "STT Autoresponder"
- A set of APIs for external systems
- Service of statistics and monitoring (metrics)
- Reporting Service
The FreeSwitch receives a call from the operator’s systems and then issues an API request to verify that the called party can record a voice message. If the called party’s number has an active subscription and its message limit has not expired, FreeSwitch records a voice message for the called party with a maximum duration of 30 seconds. After that, the recorded WAV file and call metadata are transferred to the API.
The speech-to-text agent translates speech into text format:
- The service runs as a background process in the OS.
- The agent checks for messages in the queue at intervals of 1 second, accepts 10 messages per 1 iteration for processing. From each message, the path to the file is taken, using the configured recognizer, speech is identified and transcribed into text according to the required language. Speech recognition can be performed both online (API service) and offline.
- Libraries and services for speech recognition: CRT SpeechPro, Mozilla Deepspeech
- Supported languages: Russian, English, Kazakh
- The recognized text is written to the database bound to the message, and a message is sent to the queue for the Recognized Speech Dispatch Agent.
The agent for distributing recognized voice messages at intervals of 1 second checks for the presence of a message in the queue, accepting 10 messages per 1 iteration for processing. From each message, the text and subscriber number are taken using the configured message channel, the message is transmitted using the channel API to the account, according to the subscriber number. The message is delivered if the subscriber’s number is linked to a channel:
- Supported channels: Telegram messenger, Beeline Uzbekistan mobile application
- For the Telegram channel, mailing to accounts is carried out from the number specified in the agent configuration. The mailing account number may not be registered in the contacts of the accounts to which the mailing is carried out.
- Telegram message consists of text information and mp3 file. Mp3 file is generated just before sending the message and deleted after sending, and in case of an error