[Next] [Up] [Previous]
Next: Macros introduce new Up: Constructing high-level representations Previous: Lexical analysis and

Constructing the quasi-prefix form

The recognizer processes the mathematical content to construct the quasi-prefix form described in s:quasi-prefix. For example, given the input $a+b$ , LISPIFY produces

[LVerbatim574]

Converting a list as shown above to prefix form is a simple exercise and can be found in most programming language texts. Our implementation is based on the infix to prefix converter in the text on Common Lisp by Winston and Horn[+] [HW89].

Function inf-to-pre performs the infix-to-prefix conversion. The input to this function is a list of math objects that have been processed using the classification given in s:classification-math. Each element of this list is a math object with content and attributes but no children. Note that the contents of the attributes are first converted to quasi-prefix form. For example, when recognizing [tex2html_wrap5378], the input is first converted to a list of five math objects containing the quasi-prefix representation for [tex2html_wrap5380], +, [tex2html_wrap5384], + and [tex2html_wrap5388] respectively. This is achieved by collecting the attributes that appear on each math object and processing their content recursively. Converting such a list to prefix form is now no different than processing [tex2html_wrap5390].

We now extend this algorithm to handle ambiguous mathematical notation. Conventional parsing techniques fail, since written mathematics does not adhere to a rigorous set of precedence rules. For example, the expression [tex2html_wrap5392] means [tex2html_wrap5394] rather than [tex2html_wrap5396], even though function application is normally assigned the highest precedence. Moreover, [tex2html_wrap5398] means [tex2html_wrap5400] rather than [tex2html_wrap5402]. We have taken many such anomalies into account.

The precedence table for operators t:precedence lists operators in ascending order of precedence. Only one operator is shown at each level.

[table585]
Table: Precedence table for mathematical operators.

Functions define-precedence and remove-precedence allow the user to modify the precedence table. These, however, are not for use by a casual user of AsTeR , since changes to the precedence table without a clear understanding of the recognition algorithm can cause unexpected behavior.

As pointed out earlier, precedence rules alone are not sufficient to handle written mathematics. We adapt the algorithm by using the following heuristics:

The big operators, e.g., [tex2html_wrap5432] and [tex2html_wrap5434], are treated as unary. Everything up to the next operator of lower precedence than the operator in question is considered part of the operand of the big operator. Thus, in the expression
[displaymath5374]

everything up to the = sign is treated as the summand. This technique is particularly useful in recognizing expressions like [tex2html_wrap5438]. By our heuristic, the summation is correctly recognized as the second argument to the + sign. Further, the summand is terminated by the = sign. The expression is now equivalent to recognizing [tex2html_wrap5444], which can be handled by the standard algorithm.
The integral operator can have an optional delimiter, as in [tex2html_wrap5446]. If the [tex2html_wrap5448] is present and is recognizable i.e., has been marked up as \d{x} as opposed to dx, it is recognized as the closing delimiter; the variable of integration[+] is inferred. However, this closing delimiter may not always be available -it may be encoded ambiguously, as in $\int f dx$ , or the integral itself may not require a closing [tex2html_wrap5452], as in [tex2html_wrap5454]. In the former case, our recognizer treats the juxtaposition [tex2html_wrap5456] as the integrand. Though this may seem incorrect, it is in fact exactly what the typeset output means. In the latter case, the earlier rule (treating the operand of a big operator to be everything up to the first operator of lower precedence) applies. Hence, we can correctly recognize [tex2html_wrap5458].
The closing delimiter [tex2html_wrap5460] is treated as such only if it occurs at the top level. Thus, in $\frac{\dx}{x}$ , the \dx does not end the integrand. This allows us to recognize such integrals correctly, but we cannot now infer the variable of integration. There seems to be no clean solution for this problem. Written mathematical notation relies on the fact that [tex2html_wrap5462] means [tex2html_wrap5464] and the integrand is therefore [tex2html_wrap5466].
Function application is treated as right associative. This results in [tex2html_wrap5468] being interpreted correctly. Since juxtaposition has been assigned a higher precedence than function application, [tex2html_wrap5470] continues to be recognized correctly. The following equation is a good example of such ambiguous notation -note the complete absence of parentheses:
[displaymath5375]
In written mathematics, delimiters do not always match. For example, [tex2html_wrap5472] denotes a semi-open interval. There are also cases where there is no matching closing delimiter. The recognizer is aware of such anomalies and handles them correctly. When it sees an open delimiter, it scans forward to the end of the math expression for the first matching close delimiter of the same kind. If one is found, then all of the input up to this point is treated as the delimited expression. If no matching close delimiter of the same kind is found, then the first unmatched close delimiter delimits the input. Otherwise, the occurrence is treated as an unmatched delimiter.
The [tex2html_wrap5474] is one of the few postfix operators used in written mathematics. This is treated as a special case, and we confirm that the [tex2html_wrap5476] is indeed a factorial sign by making sure that it does not have any attributes. Thus, [tex2html_wrap5478] is not a factorial symbol.

[Next] [Up] [Previous]
Next: Macros introduce new Up: Constructing high-level representations Previous: Lexical analysis and

TV Raman
Thu Mar 9 20:10:41 EST 1995