[Next] [Up] [Previous]
Next: Constructing high-level representations Up: Representing mathematical content Previous: Math object encapsulates

Refining the quasi-prefix form

  In s:high-level-models, we mentioned that all objects in our document model are linked. This is true of the objects appearing in the quasi-prefix representation. Each node in the tree is linked to its parent, as well as to its previous and next siblings. Math attributes have their parent link set to the object being attributed.

We refine the quasi-prefix form by adding the following subtypes. This makes recognizing and handling complex mathematical content cleaner.

We first introduce object math subformula, which is used to capture subexpressions appearing within the [tex2html_wrap5306] and [tex2html_wrap5308] of La)TeX. Object math subformula can be thought of as being the math equivalent of object text block described in s:high-level-models. It has the following structure:

Object math subformula can be intuitively thought of as a dummy object that encapsulates an expression.

We need object math subformula to represent expressions of the form:

[displaymath5302]

[displaymath5303]

In representing each of the above examples, object math subformula is essential in capturing the expression to which the overbrace/underbrace applies.

To enable recognition of written mathematics, tokens have to be appropriately classified. Our classification of tokens when processing written mathematics is inspired by appendix F of the TeX Book, [Knu84].

The symbols divide naturally into groups based on their mathematical class (Ord, Op, Bin, Rel, Open, Close, or Punct), [tex2html_wrap5310]

We introduce subtypes of object math object to correspond to each token type:

Written mathematical notation uses juxtaposition as an infix operator. Juxtaposition, as in [tex2html_wrap5340], mostly denotes multiplication, but can mean function application in certain contexts -[tex2html_wrap5342]. We introduce a new operator to represent juxtaposition, and to define it precisely, we also assert that all mathematical variables are single letters. Thus, [tex2html_wrap5344] is represented as the juxtaposition of three ordinary objects. This assertion is not specific to our internal representation, rather, it specifies the concrete syntax used in the electronic markup and reflects the choice made in the design of TeX. We do allow mathematical variables made up of more than one character, but these should be clearly marked up as such, e.g., as [tex2html_wrap5346], by using \mbox as in $\mbox{cab}=cab$.

The classification of a math object is defined using the following command: (define-math-classification token classification)

In certain special cases, the predefined classification shown above can be modified. A good example of this is recognizing a mathematical text that consistently uses the letters [tex2html_wrap5348], [tex2html_wrap5350] and [tex2html_wrap5352] to denote functions. Using the predefined classification, the recognizer would treat [tex2html_wrap5354] as object ordinary, leading to [tex2html_wrap5356] being represented as the juxtaposition of two objects, namely, [tex2html_wrap5358] and [tex2html_wrap5360]. Declaring [tex2html_wrap5362] to be a mathematical function by executing (define-math-classification f mathematical-function-name)

results in occurrences of [tex2html_wrap5364] being treated as a function. Hence, [tex2html_wrap5366] is correctly recognized as a function application. Note that the correct interpretation of such notation is more important for browsing than for speaking the expression.



[Next] [Up] [Previous]
Next: Constructing high-level representations Up: Representing mathematical content Previous: Math object encapsulates



TV Raman
Thu Mar 9 20:10:41 EST 1995