5.2 How Does Browsing Work?

Communication through the Printed Medium

As a first step towards developing an effective audio analogue, let us examine communication through the printed page. The printed page is passive: it is a two-dimensional visual display with marks on it. The person reading the printed page can either scan the material linearly or browse through parts of the document. Visual layout (the way the marks appear on paper) enables such browsing. Thus, rather than laying all the text in a naïve manner on the page, we exploit concepts such as line and paragraph breaks to allow the reader to perceive chunks of the printed matter and to selectively read specific portions of the text being presented.

The dpower of the printed medium lies in the eye’s ability to browse text laid out on a two-dimensional display. When reading a paper, we are able to skim through the text, focusing on paragraphs of interest, and quickly scan across to the bottom of a page when we see a reference being made to a footnote.

The Audio Setting

The previous paragraph adopted the metaphor of a document being marks on paper. In contrast, in the audio setting, we have the ear, which is passive, and a document that is scrolling away in a linear fashion. This makes the goal of achieving an audio analogue to the printed page seemingly difficult.

An Alternative Model

The eye is certainly capable of moving to any point on the page extremely rapidly. Yet, when we browse, we do not move about randomly around the printed page. Typically, we move to the next paragraph, next line, or previous word. This seems to indicate that the eye infers some structure in the printed document, which is used to move around effectively. Since each of these actions are being performed extremely rapidly, owing to the eye’s inherent scanning ability, these atomic actions are difficult to pinpoint.

We therefore conjecture the following: Every well-formatted document presents inherent logical structure, which the eye is capable of perceiving. All visual browsing actions can be characterized as movements around this structure.

A naïve Example

Consider a well-formatted document containing no mathematical formulae. Here, the layout structure consists of a root node, which is the page, and the paragraphs which are the various children. At the next level on this tree, we have the lines, and each line is further broken up into words and words themselves are broken up into characters. Given this structure, we can rephrase all of the browsing actions as a combination of simple tree traversal movements. Thus, we can identify the following atomic actions:

  1. Go to next sibling.
  2. Go to previous sibling.
  3. Go to parent.
  4. Go to left most child.
  5. Go to rightmost child.
  6. Mark current node.
  7. Return to marked node.

Using the above atomic actions and their various combinations, we can define all the browsing actions that the eye is capable of performing.

Thus, on encountering a reference to a footnote while reading we:

  1. Mark current node.
  2. Go to parent (this gets us out of the current paragraph).
  3. Move across siblings until footnote is located.
  4. Read footnote.
  5. Return to marked node.

A Complex Example

Consider the following expression as read by a person familiar with mathematical notation:

∫   e−x2 + ex3
   ---2-----2--dx
   sinx  + cos x

The experienced reader is able to quickly scan the above expression and, while perusing the denominator, access the numerator. This ability is a consequence of internalizing the underlying structure conveyed by the visual layout and using it to traverse the information. The atomic actions in accessing the numerator are:

  1. Mark current node.
  2. Read previous sibling.
  3. Return to marked node.

We enable audio browsing by allowing a listener to perform the same kind of traversals. AS TE R internalizes a sufficiently rich structure to permit all of these browsing actions.