Bartels Platina Speech Recognition System
Bartels Speech Recognition is a novel type of system providing realtime access to an almost voice independent vocabulary with up to several thousand active words available. Once trained voice-independently, there is no need to re-train the system for new users. The recognition is carried out spontaneously. The realtime behaviour of the system allows for the continuous recognition of more than 100 words per minute. User interaction won't be slowed down by the system's recognition process. The highly sophisticated features of Bartels Speech Recognition allow for the implementation of applications such as
Bartels Speech Recognition is a fully professional hardware-based system.
The spoken word is recorded with a high-quality condenser table microphone (AKG C580). The digitizing process is carried out in the microphone socket, right before transmitting to the PC. The evaluation is done in a frequency range from 0 to 4 KHz with four times oversampling. The connection of the microphone to the socket is realized with a standard XLR plug. Before the A/D conversion takes place, the differential input signal level is first raised with a high-quality extremly low-noise input amplifier and adjusted up to 100db by a software controlled VCA. The conversion is done with 12 bit precision, 14 bit converters may be used optionally. Higher resolutions are not useful since the 12 bit VCA signal level adjustment already has taken place, thus yielding more than 20 bit dynamic range.
The whole analog part of the system is galvanically separated using opto-couplers and DC/DC-converters, and is shielded by a massive aluminium case. This casing also provides excellent vibration damping.
The system hardware is designed to gain correct system response for any speaker/microphone distance from 0.2 up to 2.0 meters.
The sampling frequency can be raised up to 100 KHz by software. This offers the possibilty to use the system even in other applications than speech recognition. I.e., the system is well prepared for future applications.
The speech signal digitized within the microphone socket is transferred to the PC via a bi-directional link. This link is prepared for speech output as well.
The PC/AT-slot compatible card contains two independent parallely working digital signal processors and a speech database processor especially developed for that purpose. The speech database processor consists of a custom IC and 8 MByte onboard DRAM for vocabulary storage. Parallel computing in several internal ALU blocks using a new algorithm allows for the error-tolerant processing of database queries in a few milliseconds.
The first digital signal processor performs a 512 point FFT. The resulting 256 frequency bands are transferred to the second digital signal processor through a patented novel type of weighting network (neural net). The classification output of the second digital signal processor is directed to the speech database processor for error tolerant search of the word or parts of it.
As described above, the whole speech recognition is done on the PC slot card. This frees the PC CPU for other tasks such as syntax evaluation and text processing.
Any dependencies to the basic frequency of a speaker's voice are eliminated by a new algorithm in the pre-processing section. The weighting networks tolerate different resonances. This contributes to the system's speaker independency. Additionally, the database is able to store and to adjust different pronunciation variants for each word.
The high resolution of 256 frequency bands (16 Hz per frequency band) and the high sampling rate (100 FFT's per second) enables the system to distinguish instantly speaker independent even very similiar words as 'tail' and 'rail', presuming proper training and clear pronunciation.
The system can be operated using either a C-function library or our training and recognition software. From PC point of view the speech recognition is done in the background.
The training can be done by the user as well as by Bartels System. It is also possible to re-train or delete the vocabulary predefined by Bartels System.
Usually, a training with one speaker will be sufficient for easily distinguishable words. For the training of similar words such as 'tail' and 'rail' we recommend to have the system trained by a few different speakers. This enables the system to separate significant word-determinating differences from speaker dependencies.
During the training phase, the system uses a try-and-error method to quickly learn to recognize words which are pronounced in different ways. In contrast to some cheap "game"-programs, the high resolution enables the system to hear the subtle characteristics of human language.
The slot card may be used in PC/AT-compatibles computers with 386, 486, or Pentium CPU. Main memory of at least 2 MByte is recommended. The card occupies a 16-bit slot and 16 I/O addresses, e.g. 0320h to 032fh. A hard disk with at least 40 Mbytes capacity is required.
The Bartels Speech Recognition software runs as Protected Mode program under DOS. For proper use of the C-function library we recommend High-C 386 together with PharLap tools or Windows Extender. Latest information about support of other compilers is available on request.
The text files created by the training and recognition software are stored in ASCII format and may be edited with almost every available text processing software.
Please contact us for information on how to install and integrate the Bartels Speech Recognition system's hardware and software to industrial environments.