VoiceInfo
03 Dec 2008 04:29 UTC 2008338+0429 UTC

Development Page--Not for Official Use

Voiceinfo (or is it VoiceInfo?)

VoiceInfo is CBI-DNR's package that allows voice reporting of real-time information over the telephone system. The package is used as the basis for the Corpus Christi Real-Time Navigation System, WindInfo?, and the Freeport FlowInfo? systems.

VoiceInfo is written in Perl and makes use of voicemodems such as those manufactured by US Robotics, Inc. Voicemodems are modems that have been augmented to have voice recording and playback capabilities (i.e., to allow a computer to act as an answering machine). Thus, they are capable of detecting touch-tone signals from callers and playing back pre-programmed voice instructions or phrases. Voicemodems are fairly inexpensive (less than $150) and can be easily controlled from Unix or Linux through an RS-232 interface.

Recently we have noticed some problems with voiceinfo "stuttering" when playing back phrases or other details. The problem is related to the system load on the host computer (currently dnr.cbi.tamucc.edu); as more processes are running on the system, the voiceinfo software is unable to send the data to the modem fast enough for uninterrupted speech. Ultimately the problem is a design flaw in the voiceinfo software itself which uses an ad-hoc "timing" parameter to try to throttle the flow of data to the voice modem (and is apparently too slow).

The voiceinfo software is available via SVN. The official "executing" copy is in /u3/voiceinfo/bin/voiceinfo. The problem subroutine is PlayPhrases:

 
sub PlayPhrases {
    my(@phrases) = @_;
    my(@p) = split(' ',join(' ',@phrases));
    my(@o,$kbd);
    foreach(@p) { 
        if (/^\d/) { push @o,PhraseNumber($_); }
        else { push @o,$_; }
    }
    Remote::Puts($MODEM,"AT#VTX\r"); Remote::Wait($MODEM,"CONNECT\r\n",10);
    foreach(@o) {
        open(GSM,"$args{'gsmdir'}/$_.gsm") || next;
        while (sysread(GSM,$d,33)) {
            $d =~ s/\x10/\x10\x10/g;
            Remote::Puts($MODEM,"\xfe\xfe$d\x00\xa5\xa5");
            last if (Remote::Hit($MODEM,$args{'hittime'})); 
        }
        close(GSM);
    }
    if (Remote::Hit($MODEM,0)) { 
        $kbd = Remote::Gets($MODEM,0.019,'\x10.');
        Remote::Puts($MODEM,"\x10\x18");
    } else { Remote::Puts($MODEM,"\x10\x03"); }
    Remote::Wait($MODEM,"VCON",30);
    $kbd;
}
 

PlayPhrases is responsible for getting the voice modem to play a sequence of phrases. The subroutine accepts a list of phrase names as arguments, each phrase name corresponds to a ".gsm" audio file held in disk somewhere containing the audio of the phrase. PlayPhrases simply sends the audio files to the voice modem in the order specified by the arguments (it's basically a fancy cat(1)).

The details of the protocol for sending the modem phrases is described in VoiceModemProtocol.

The difficulty arises in that we would like the phrases to be interruptible if the user presses a telephone key before the set of phrases has completed. The current implementation of PlayPhrases accomplishes this by sending a small chunk (33 bytes, or 20ms) of audio to the voicemodem, and then checking to see if the user pressed a key. If yes, the phrases are aborted; otherwise the next chunk of audio is sent and the key press checked again.

Because both Unix and the voicemodem have some serial buffers in place, it's possible for several seconds of audio data to be stored in the buffer when the user presses a key. Thus, when a user presses a key, the audio may continue playing for a few seconds before it's aborted, possibly fooling the user into believing his keypress wasn't detected. So, the audio needs to terminate as soon as the user presses a key.

The current implementation of PlayPhrases tries to "solve" this problem by placing delays between each chunk of data sent to the voicemodem, so that the buffers never get very full. This is the purpose of the "hittime" parameter in the subroutine which specifies the amount of delay between packets. Unfortunately, this parameter is very system and load dependent; if the delay is set too large (long), then the packets aren't sent quickly enough and we get the "stutter effect". If the delay is too small, then the packets are sent too quickly and the buffers fill causing the audio to continue even after the user presses a key. A delay that works on an idle system is probably too large for a busy system.

The "correct" design would be to have voiceinfo and the voicemodem immediately flush their buffers and stop playing audio as soon as the user presses a telephone key. The documents Pm read that described the voicemodem audio protocol indicated that this is possible, but Pm was never able to get it to work properly. (Pm actually gave up and decided to go with the timing route instead because he knew the other solution would "work" at the time, even though it wasn't a clean design.) Given the current stuttering problems with voiceinfo, it looks like it's time to implement the "correct" design instead of the "trial-and-error" design.

Footnotes

1. Possible explanation of 33 byte, 20ms packet.

Page last modified on May 29, 2008, at 10:08 AM