5.1 Introduction

When perusing printed text, a reader can quickly skip portions of the document, reading only those sections that are of interest. Typeset documents allow such structured browsing by using layout cues to present the underlying document structure; from here, the eye’s ability to “randomly” access portions of the two-dimensional printed page appears to take over. The passive information in a printed document is accessed by an active reader capable of selectively perusing the text. Hence, visual documents themselves need not be interactive.

Things are different with audio. This passive-active relationship is reversed in traditional oral communication; the information flows past a passive listener who has little control on what is heard. The problem is particularly severe when presenting structured information (e.g., complex mathematics) —a listener is likely to lose interest by the time the relevant information is presented. Hence, we need to enable active listening, i.e., enable the listener to determine what is heard. Therefore, to be effective, audio documents need to be interactive.

The first step is to make audio documents interactive. Techniques for specifying and modifying how particular objects are rendered were described in Section 4.1. In addition, a browser for audio documents allows a user to interactively traverse the internal high-level representation described in Chapter 2 and listen to portions of interest. The browser provides basic tree-traversal commands. These can be composed to effectively browse the information structure.

The design of our browser is motivated by the conjecture that most of visual browsing actions are directed by the underlying structure present in the information. Thus, when we read a complex mathematical expression that involves a fraction, we can quickly look at the numerator while reading the denominator. This single action of looking up at the numerator can be decomposed into a series of atomic tree traversal movements with respect to the structure of the expression. In the visual context, these actions happen extremely fast, leading to a feeling that the eye can access relevant portions of the visual display almost randomly. However, this notion of randomness disappears when we consider that such visual browsing becomes difficult in a badly formatted document where the underlying structure is not so apparent. Similarly, even when presented with a well-formatted document, a person unfamiliar with the subject matter finds it impossible to perform the same kind of visual browsing. Visual browsing thus depends on familiarity with the underlying structure and a clear rendering of this structure. AS TE R parallels this functionality by building up a rich internal representation and providing a set of atomic actions to traverse this representation. The effectiveness with which a user can browse this representation is now a function of the user’s familiarity with the structure in the subject matter being presented.

We present the browser as follows: Section 5.2 motivates the need for a browser by analyzing how visual browsing works. Based on this, we derive a corresponding model for audio browsing. We identify a set of atomic browsing actions that enable general browsing. Section 5.3 describes how a user can traverse the high-level representation of a document. This section introduces the concept of a current selection and describes how the user is unobtrusively cued to the nature of the current selection. Section 5.4 describes how the listener can execute actions after setting the current selection. These actions include listening to the current selection, rendering it relative to its parent, and listening to the rest of the document. Cross-references form an important component of technical documents and are described in Section 5.5. A particularly difficult problem faced when listening to mathematical texts on conventional talking books, or even when reading printed mathematical texts, is keeping track of equation numbers and understanding statements that refer to equations and theorems by their numbers in the running text. We describe a flexible mechanism that allows a listener to annotate cross-referenceable objects with meaningful labels that can be used to refer to such objects in later cross-references. This section also describes how places of interest in a document can be marked using a bookmark facility. Appendix A.5 documents the external interface to the browser. The browser, along with the ability to change rendering rules and styles, makes audio documents produced by AS TE R fully interactive.