The recognizer used in AsTeR captures logical structure present in documents encoded in the TeX family of languages. An important feature of this recognizer is that it works on the entire gamut of encodings, ranging from plain ASCII documents, i.e.,no explicit markup, up to documents containing completely unambiguous encodings of the logical structure.
The basic document model used in AsTeR is the attributed tree. Each hierarchical level of the document is modeled as a node in this tree. Each node can have content, children and attributes. Using object-oriented terminology, each different kind of node of the tree is called an object. Thus, ``chapter'', ``section'', ``paragraph'', and ``sentence'' are all objects. If a document contained five sections, its representation in AsTeR would have five instances of object ``section''. This object-oriented terminology is used because AsTeR actually uses CLOS objects in this fashion. The use of an object-oriented language was instrumental in allowing us to develop and implement the ideas in AsTeR incrementally and effectively.
This attributed tree structure is augmented to represent mathematical content; we call this augmented representation the quasi-prefix form, (see figure fig:math-object). Expressions that are completely unambiguous, e.g.,[tex2html_wrap313], are captured in their prefix form. In addition to linearizing the underlying tree structure, mathematical notation uses visual attributes/ such as superscripts and subscripts, whose interpretation is context-dependent. We extend the prefix form to capture such visual attributes -hence the name quasi/-prefix.
[figure69]
Figure 1: A math object with attributes. Each
of the attributes themselves contain math objects.
A key feature of the quasi-prefix form is that it delays the assignment of semantic interpretation to instances of ambiguous written mathematics. At the same time, it is sufficiently rich to permit renderings that are independent of the order in which the written symbols would appear on paper. Linear renderings with the rendering-order hard-coded into the system can be produced with a simpler representation, e.g.,a linear list, or even the TeX encoding itself. This was shown by TeXTALK , a string-substitution based program that directly transformed TeX source to produce spoken renderings [Ram92][Ram91].
As an example, assume that \kronecker
[+] is defined as an infix binary
operator. Given the expression
[displaymath327]
encoded as
[LVerbatim86]
we can represent it in the quasi-prefix form by a tree whose root is object kronecker, and write rendering rules for object kronecker/ that produce either ``a kronecker product b'', or ``kronecker product of a and b''. The former rendering can be produced by TeXTALK as well, but a simpler list-like representation restricts the system to this one form of rendering.
In producing printed output, one view is sufficient; once the information has been presented visually, a person reading the material can access it in any desired order. But even with visual rendering, different views may be desired. For example, one may wish a view that gives only the table of contents of a paper. Or, for a document that presents an algorithm, one view could give the whole presentation and a second view could present only the overview of the algorithm. See [Lam93] for a discussion on the hierarchical presentation of proofs. The linearity of audio makes it essential that AsTeR have the ability to present multiple views. Lack of this feature is one of the major shortcomings of books on tape, where the listener is restricted to the one view presented by the person speaking the text. AsTeR allows the listener to explore the material the same as a person perusing printed material, and thereby enables active/ listening.