We refine the quasi-prefix form by adding the following subtypes. This makes recognizing and handling complex mathematical content cleaner.
We first introduce object math subformula, which is used to capture subexpressions appearing within the [tex2html_wrap5306] and [tex2html_wrap5308] of La)TeX. Object math subformula can be thought of as being the math equivalent of object text block described in s:high-level-models. It has the following structure:
We need object math subformula to represent expressions of the form:
[displaymath5302]
[displaymath5303]
In representing each of the above examples, object math subformula is essential in capturing the expression to which the overbrace/underbrace applies.
To enable recognition of written mathematics, tokens have to be appropriately classified. Our classification of tokens when processing written mathematics is inspired by appendix F of the TeX Book, [Knu84].
The symbols divide naturally into groups based on their mathematical class (Ord, Op, Bin, Rel, Open, Close, or Punct), [tex2html_wrap5310]
We introduce subtypes of object math object to correspond to each token type:
Written mathematical notation uses juxtaposition as
an infix operator. Juxtaposition, as in [tex2html_wrap5340],
mostly denotes multiplication, but can mean function
application in certain contexts -[tex2html_wrap5342]. We
introduce a new operator to represent juxtaposition, and to
define it precisely, we also assert that all mathematical
variables are single letters. Thus, [tex2html_wrap5344] is
represented as the juxtaposition of three ordinary
objects. This assertion is not specific to our internal
representation, rather, it specifies the concrete syntax used
in the electronic markup and reflects the choice made in the
design of TeX. We do allow mathematical variables made up of
more than one character, but these should be clearly marked up
as such, e.g., as [tex2html_wrap5346], by using
\mbox
as in $\mbox{cab}=cab$
.
The classification of a math object is defined using the following command: (define-math-classification token classification)
In certain special cases, the predefined classification shown above can be modified. A good example of this is recognizing a mathematical text that consistently uses the letters [tex2html_wrap5348], [tex2html_wrap5350] and [tex2html_wrap5352] to denote functions. Using the predefined classification, the recognizer would treat [tex2html_wrap5354] as object ordinary, leading to [tex2html_wrap5356] being represented as the juxtaposition of two objects, namely, [tex2html_wrap5358] and [tex2html_wrap5360]. Declaring [tex2html_wrap5362] to be a mathematical function by executing (define-math-classification f mathematical-function-name)
results in occurrences of [tex2html_wrap5364] being treated as a function. Hence, [tex2html_wrap5366] is correctly recognized as a function application. Note that the correct interpretation of such notation is more important for browsing than for speaking the expression.