2.2 Representing Mathematical Content

We have designed an internal representation, called the quasi-prefix form, for handling mathematical content. It captures the full prefix form of mathematical expressions with operators and simple variables. The tree corresponding to x + y has root + and children x and y and is represented as such internally.

In addition to linearizing the underlying tree structure, mathematical notation uses visual attributes such as superscripts and subscripts. We extend the prefix form to capture such visual attributes —hence the name quasi-prefix.

The key feature of the quasi-prefix form is that it delays the assignment of semantic interpretation to instances of ambiguous written mathematics. For example, the superscripts in an expression are represented not as exponents but as attribute superscript. This is because the meaning of these visual attributes is context dependent. Assigning one of the several possible interpretations at the recognition step is unduly restrictive in a fully flexible rendering system. For example, interpreting the superscript as an exponent would result in x2 being recognized correctly, but AT being incorrectly recognized. Further, it would be impossible to later distinguish between the correct and incorrect interpretations. The quasi-prefix form captures the mathematical notation itself, leaving the assignment of semantic interpretation to a later step. By doing so, we can represent content where we do not have sufficient semantic information. Thus, Dx1u might denote the first derivative of u with respect to x in a specific context. The superscript and subscript might mean something entirely different in another context, e.g., as in Dnan. If more contextual information is available at the rendering step, AS TE R can speak AT as “cap a transpose”. In the absence of such contextual information, the system can still produce an audio notation that maps different features of the written notation to unique audio dimensions.

At the same time, the quasi-prefix form is sufficiently rich to permit renderings that are independent of the order in which the written symbols appear on paper. Linear renderings with the rendering-order hard-coded into the system can be produced with a simpler representation, e.g., a linear list, or even the TEX encoding itself. This was shown by TE XTA LK, a string-substitution based program that directly transformed TEX source to produce linear renderings [Ram91Ram92].

As an example, assume that \kronecker3 is defined as an infix binary operator. Given the expression a b encoded as $a\kronecker b$, we can write a rendering rule for object kronecker represented in the quasi-prefix form to produce “a kronecker product b”. This rendering can be produced by TE XTA LK as well, but a simpler list-like representation restricts the system to this one form of rendering. Using the quasi-prefix form, AS TE R can also produce “the kronecker product of a and b”.

Thus, even though the quasi-prefix form captures only the information present in the TEX encoding, it is still flexible enough to permit more sophisticated processing.

This power is necessary in overcoming the passive nature of listening. In producing printed output, it is sufficient to produce one view; once the information has been presented visually, a person reading the material can access it in any desired order. TEX itself therefore never builds up an internal representation like the quasi-prefix form; its purpose is to typeset the input according to a fixed set of rules, and the TEX encoding directly reflects the linear order4 in which expressions appear on paper. Thus, here, the displayed information is passive while the person reading it is active. The situation in presenting information orally is exactly the opposite; the information flows past a passive listener. In order to achieve effective oral communication, it is therefore important to be able to present multiple views of the information.

Math Object Encapsulates Quasi-Prefix Form

To represent the quasi-prefix form, we extend the attributed tree model defined in the previous section with object math object. We define six such attributes in Figure 2.1 on page 34.





left-superscript accent superscript
↖ ↑ ↗
math object
↙ ↓ ↘
left-subscript underbar subscript




Figure 2.1: A math object with attributes. Each of the attributes themselves contain math objects.

A math object may have any or all of these attributes. An attribute can have a math object as content.

Here are the basic object types in this representation:

The structure is recursive. For example, x1k is represented by the math object

The representation can capture mathematical expressions with arbitrarily complex visual attributes. Let M denote the math object shown above. Then

  k
xxx1k
  1

would be represented by math object M shown below:

Refining the Quasi-Prefix Form

In Section 2.1, we mentioned that all objects in our document model are linked. This is true of the objects appearing in the quasi-prefix representation. Each node in the tree is linked to its parent, as well as to its previous and next siblings. Math attributes have their parent link set to the object being attributed.

We refine the quasi-prefix form by adding the following subtypes. This makes recognizing and handling complex mathematical content cleaner.

We first introduce object math subformula, which is used to capture subexpressions appearing within the { and } of (LA)TEX. Object math subformula can be thought of as being the math equivalent of object text block described in Section 2.1. It has the following structure:

Object math subformula can be intuitively thought of as a dummy object that encapsulates an expression.

We need object math subformula to represent expressions of the form:

◜-ktim◞e◟s-◝
x +⋅⋅⋅+ x

x◟+-y◝◜+z◞
  > 0

In representing each of the above examples, object math subformula is essential in capturing the expression to which the overbrace/underbrace applies.

To enable recognition of written mathematics, tokens have to be appropriately classified. Our classification of tokens when processing written mathematics is inspired by appendix F of the TEX Book, [Knu84].

The symbols divide naturally into groups based on their mathematical class (Ord, Op, Bin, Rel, Open, Close, or Punct), …

We introduce subtypes of object math object to correspond to each token type:

Written mathematical notation uses juxtaposition as an infix operator. Juxtaposition, as in a(b + c), mostly denotes multiplication, but can mean function application in certain contexts —f(x + y). We introduce a new operator to represent juxtaposition, and to define it precisely, we also assert that all mathematical variables are single letters. Thus, cab is represented as the juxtaposition of three ordinary objects. This assertion is not specific to our internal representation, rather, it specifies the concrete syntax used in the electronic markup and reflects the choice made in the design of TEX. We do allow mathematical variables made up of more than one character, but these should be clearly marked up as such, e.g., as cab = cab, by using \mbox as in $\mbox{cab}=cab$.

The classification of a math object is defined using the following command: (define-math-classification <token> <classification>)

In certain special cases, the predefined classification shown above can be modified. A good example of this is recognizing a mathematical text that consistently uses the letters f, g and h to denote functions. Using the predefined classification, the recognizer would treat f as object ordinary, leading to f(x) being represented as the juxtaposition of two objects, namely, f and (x). Declaring f to be a mathematical function by executing (define-math-classification f mathematical-function-name)

results in occurrences of f being treated as a function. Hence, f(x) is correctly recognized as a function application. Note that the correct interpretation of such notation is more important for browsing than for speaking the expression.