Realtime Text for the IVR mode in FreeSWITCH

Realtime Text is a method for exchanging keystrokes over SIP. This is currently used by people with hearing impairments, but it could be very useful for the rest of us in dealing with Interactive Voice Response systems more fluently, especially when using a softphone.


Demonstration call

The demo below shows me calling the Dutch tax office. Being self-employed, I have to call them regularly. The tedious first form is the usual call; the second part is simulated with FreeSWITCH, and is enhanced with text. As you can see, the immediate presentation of menu options speeds up the process enormously.

TODO: Figure out how to record screens; include a demonstration video

What is Realtime Text?

Just like there are media flow definitions for audio and video, there also is one for text, in RFC 4103. In essence, it describes how to encode keystrokes in an RTP packet, and how to negotiate setting up a connection through SIP. It is possible to negotiate a combined connection, for instance audio + text, so an IVR will be able to run both in parallel. In fact, it will only negotiate additional text if the client asks for it.

Realtime Text has originated from the hearing-impaired community, and is advocated by the R3TF.

How about privacy?

Privacy may not matter much when you use Realtime Text to talk to an IVR, but it becomes more interesting when you chat with another end-user. And since Realtime Text runs over RTP, it is also subject to protection through ZRTP. This means that a highly pragmatic form of privacy is available to users, in a way that they do not have with email or XMPP chat, let alone when using commercial rebranding such as WhatsApp, FaceBook or SMS. The protection of ZRTP is end-to-end, meaning that a properly setup connection cannot be tapped by anyone in the intermediate path. We believe that privacy is not a privilege for a select few, but instead that everyone should be able to enjoy it.

What FreeSWITCH commands are added?

In your dialplan, you can use a few new commands to steer the use of Realtime Text. First, there is an explicit acceptance of text:

<action application="accept_text" />
<action application="answer"/>

As shown, accept_text is applied before answering a call. This enables the negotiation of Realtime Text during the call setup, but only when the caller offers it.

Of course there is a command to send textual data, which is very useful before entering an IVR portion of a dialplan:

<action application="send_text" data="Het is weer voorbij die mooie zomer" />

This action will be silently skipped if no text support was setup for the channel; which is the case if accept_text was not executed or if the calling client did not negotiate Realtime Text. But otherwise, this command ships off the text provided in the data field.

Then there is an IVR command to receive textual data after sending off an (interruptable) audio prompt; it is suggested that numeric codes are also accepted, so that interaction with DTMF codes is also possible:

<action application="send_and_receive_text" data="3 25 3 4000 \s# tone_stream://%(10000,0,425) tone_stream://%(1000,0,600) entered ^.*$" />
<action application="log" data="NOTICE Realtime Text is =${entered}=\n" />

The Realtime Text flow integrates with the internal DTMF support of FreeSWITCH, so that it is possible to mix DTMF with keystrokes, so the user has a free choice between the mechanisms. If you have Realtime Text, you could enter a keyword, rather than memorising codes, for example. Cleverly coded dialplans could therefore even support those who want to receive Realtime Text, but prefer to send DTMF. In the example above, either # or a space may end the input phase of the FreeSWITCH variable ${entered}. Note that these end markers will not be part of the variable's contents.

Discussion about improved integration

It is well possible to integrate Realtime Text even further. Where the say: facility is used, the spoken text could be transmitted over Realtime Text together with the rendered audio version. Most similar facilities can be solved in one stroke, in the function switch_ivr_speak_text_handle(). This would have the advantage of offering better support for hearing impaired callers, but only in those cases where text is rendered into audio, so its impact would be limited. Furthermore, this use would lead to the same waiting times as the audio interactions, and so not improve the responsiveness of an Interactive Voice System. It may therefore be greasing the wheels of lame textual solutions; it is much better to send a text out just prior to starting the interactive portion.

Related to the above, one could wonder if implicit acceptance of textual interactions would be at all beneficial. This would mean that the accept_text application could go. As before, a well-designed textual interaction will always be better than a computer's stab at it, but people with hearing impairments may appreciate the implicit option.

Trying it

The current patch is a working and workable version of Realtime Text for FreeSWITCH. It adds value when adopted into the mainstream, but it also has a few pragmatic limitations that you may care to know about:

  • FreeSWITCH is a (fast) moving target, and this patch is relative to a particular version. A stable version, mind you. We are hoping to integrate the patch to make it generally available.
  • Note that Realtime Text uses the ISO-10646-1 character set, encoded as UTF-8. This overlaps, but is not quite the same as ASCII. In addition, a few special control characters are used by terminals.
  • Redundancy in the Realtime Text encoding is not implemented in this patch; FreeSWITCH does not do this for audio either, although it could be argued that text is more dependent on the added reliability through resends. Basic hooks for redundancy are available in the patch, albeit commented-out. I am missing an understanding (notably, documentation) about the handling of RTP headers in codecs to make this work.
  • It does not implement that characters-per-second limitation, which means that it is less suitable for coupling to gateways that map Realtime Text to POTS implementations of similar protocols. In other words, just use it over the Internet at full speed.

To try out Realtime Text with FreeSWITCH, proceed as follows, assuming Linux:

  1. Checkout FreeSWITCH from their git repository:

    git clone git://
    cd freeswitch
  2. Switch to the version known to be working with this patch:

    git checkout 8d614040168083f0dad5ca4a45bae7c06ed3de7c

    Note: Not taking this step does not seem to cause problems; the head version (the one checked in on Dec 5, 2012) only leads to one patch rejection below, and that is easy to fix. However, this version has not been tested yet. Furthermore, updates to the video handling code may be good to clone to the text code, so it is not entirely trivial.

  3. Download the patch:

  4. Apply the patch:

    patch -p1 < rttpatch-v1-8d614040168083f0dad5ca4a45bae7c06ed3de7c
  5. Build and install FreeSWITCH as you normally would

  6. Setup conf/autoload_configs/modules.conf.xml with:

    <load module="mod_rtt"/>

    Without loading this module, the commands introduced above will not be available, and no Realtime Text interaction is possible. Note however, that not all Realtime Text facilities are concentrated in this module; the structure of FreeSWITCH simply does not allow for that. Support for Realtime Text is present in the Sofia SIP and even the central switch code.

  7. Configure your dialplan with instructions as given above.

  8. Enjoy fast IVR support!