3.4 Audio Formatting using Non-Speech Audio

This section describes the design and implementation of the sound component of the audio formatter. AFL variables *current-audio-state* and *global-audio-state* represent the local and global states of this component. Here, state is a point in sound space. The sound component provides operators for constructing new points. AFL assignment statements can be used to set the local and global states of the sound component in a manner similar to that described in the case of the speech component. The local scope introduced by AFL blocks also applies to *current-audio-state*.

Space of Sound Cues

We define the space of sound cues just as we defined the speech space. Things are a little more complicated in this case, because it is not so clear what all the dimensions are, or even whether the number of dimensions is finite. If by non-speech audio we mean any audible sound different from intelligible speech, the space is indeed very large. In order to use non-speech audio effectively, we need to restrict the space. Thus, in the following, the sound space is a suitably restricted subspace of the entire space of non-speech audio.

The following enumerates a few of the dimensions we could use in constructing the non-speech component. Depending on the type of hardware available, we will have fewer or more dimensions.

  1. Amplitude of sound.
  2. Pitch (fundamental frequency).
  3. Frequency of the different harmonics.
  4. Attenuation or resonance.
  5. Directionality.

We thus think of a point in this restricted subspace of non-speech audio as a distinct sound. Each channel of audio output is a point in an instance of such a subspace. Multiple channels of sound are thus modeled as a direct sum of these subspaces.

In the following, the sound space and the associated primitives for working in this space are defined assuming no restrictions on the underlying hardware. However, AS TE R restricts itself to the simpler setting provided by SPARC audio.

Types of Operators

The operators for moving in sound space are similar to those of the speech space (see Section 3.2). The distinguishing factor here is that sounds have duration, so the duration needs to be specified. This either takes the form of a simple time unit or is specified in terms of synchronizing the non-speech audio with events on other components, e.g., “play this sound until a particular event has completed”. As discussed in Section 3.3, the AFL block serves as the smallest unit of synchronization.

Synchronize and Play.

Primitive play-once waits until pending events on all audio components have finished executing before itself executing the event specified by the current point in sound-space. The action executed by play-once could itself be either synchronous or asynchronous. In either case, the duration of the event is specified explicitly as a time unit or implicitly by the nature of the event. Primitive synchronize-and-play is similar, except that the sound to be played is specified explicitly.

AS TE R typically uses this primitive to generate sounds to cue the beginning of the rendering of certain objects.

Play until Told to Stop.

Another type of synchronization primitive specifies that an event is to be repeated until certain other events occur. Thus, we can specify that a certain sound-space event is to be repeated for the duration of the rendering of an object. Here, the duration of the event is specified in terms of other events taking place in the audio formatter. We can picture this as turning on a conceptual switch on the audio player and turning it off at a later time. This is achieved by executing a loop-forever statement, as discussed in Section 3.3; such an event is terminated when the block in which it appears is ready to terminate.

Select a Sound to Play.

In an implementation where we can actually move along all the dimensions in the sound space, the new state would be specified using move operators. However, in a more primitive implementation environment where this is not possible, selecting a sound or moving to a new state amounts to picking one of a set of distinguished points. Thus, the space becomes discrete.

Examples of Use

Here are some examples of the use of non-speech audio cues in AS TE R.

The following rendering rule for itemized lists uses a sound cue to denote an “audio bullet”, with the sound cue being played before rendering each item of the list. The synchronization provided by play-once ensures that the sound cue for each item in the list is played only after the text from the previous item has been spoken.


    (afl:new-block (loop  for item in items  do



                          (read-aloud item))))

An audio highlight is a sound that repeats in the background while text is being spoken. The rendering rule given below audio highlights the abstract in a technical document using the non-speech primitives. The rule locally selects a sound and turns on the non-speech audio. This results in the sound repeating in the background. Since this action of turning on the audio is executed within the block commenced in the rendering rule for the abstract, the sound is automatically turned off once the abstract has been spoken. Further, since the AFL block is an implicit cobegin statement, it terminates only after all speech activity commenced inside the block have been completed —as a consequence the audio highlight is turned off only after the entire abstract has been spoken.

(def-reading-rule ...



      (afl:select-sound afl:⋆current-audio-state⋆ ⋆abs-cue⋆))


      (afl:switch-on afl:⋆current-audio-state⋆)) ...))

Implementation Details

The following constraints are imposed by the implementation environment:

  1. AS TE R currently uses only digitized sounds. The non-speech space is therefore discrete.
  2. Sparc audio allows only one channel of output, since there is only one sound chip.

The current implementation of non-speech audio uses the Lucid multitasking facility. It also uses the Lucid extensions to Common Lisp for interfacing with existing UNIX utilities and programs written in C.

Audio Player.

Object audio-player provides an abstraction barrier between the external interface to the sound space and the underlying implementation —the interface only deals with object audio-player. An audio-player consists of a sound to be played, a function to play the sound and a switch to turn the sound on and off. Once an audio player object has been created, its sound can be changed, and it can also be turned on and off using its switch. The external interface to the sound space maps points to the state of the underlying audio player. We can think of object audio-player as the underlying hardware for the sound component of the audio space. Thus, we could have one audio-player for each audio component. Object audio-player is implemented so as to allow the use of other sound generation software that becomes available in the future. Given a function f that generates sound when called with argument s, we can create an audio player with function f and sound s to create a uniform interface to the underlying sound generation software.

AFL blocks and assignments are used to manipulate the external representation of the state of the subspace, and the underlying hardware representation, in this case the audio-player, is automatically updated.

Using other Sound Generation Tools.

Object audio-player allows the use of other sound generation tools with little modification to the AFL primitives.

PLAY-NOTESis a simple C program that plays a short beep when called with a set of arguments. A foreign function interface to this C function provides the Lisp counterpart:

  (play-notes ’key(volume length tone decay octave))

To create an audio player that uses the above to generate sounds, we can write:

  (setf ⋆play-notes⋆

        (make-instance 'audio-player :function #'play-notes

                       :sound (list :octave "5c" )))

We can now turn this object on or off, and also change the note that is played by executing:

  (setf (player-sound ⋆play-notes⋆ )

        (list :octave "6c"))

Finally, we can implement a new component space around this audio-player object, called the play-notes-space, with its local state etc., and manipulate it using AFL constructs.