$a+b$
, LISPIFY produces
[LVerbatim574]
Converting a list as shown above to prefix form is a simple exercise and can be found in most programming language texts. Our implementation is based on the infix to prefix converter in the text on Common Lisp by Winston and Horn[+] [HW89].
Function inf-to-pre performs the infix-to-prefix conversion. The input to this function is a list of math objects that have been processed using the classification given in s:classification-math. Each element of this list is a math object with content and attributes but no children. Note that the contents of the attributes are first converted to quasi-prefix form. For example, when recognizing [tex2html_wrap5378], the input is first converted to a list of five math objects containing the quasi-prefix representation for [tex2html_wrap5380], +, [tex2html_wrap5384], + and [tex2html_wrap5388] respectively. This is achieved by collecting the attributes that appear on each math object and processing their content recursively. Converting such a list to prefix form is now no different than processing [tex2html_wrap5390].
We now extend this algorithm to handle ambiguous mathematical notation. Conventional parsing techniques fail, since written mathematics does not adhere to a rigorous set of precedence rules. For example, the expression [tex2html_wrap5392] means [tex2html_wrap5394] rather than [tex2html_wrap5396], even though function application is normally assigned the highest precedence. Moreover, [tex2html_wrap5398] means [tex2html_wrap5400] rather than [tex2html_wrap5402]. We have taken many such anomalies into account.
The precedence table for operators t:precedence lists operators in ascending order of precedence. Only one operator is shown at each level.
[table585]
Table: Precedence table for mathematical
operators.
Functions define-precedence and remove-precedence allow the user to modify the precedence table. These, however, are not for use by a casual user of AsTeR , since changes to the precedence table without a clear understanding of the recognition algorithm can cause unexpected behavior.
As pointed out earlier, precedence rules alone are not sufficient to handle written mathematics. We adapt the algorithm by using the following heuristics:
[displaymath5374]
everything up to the = sign is treated as the summand. This technique is particularly useful in recognizing expressions like [tex2html_wrap5438]. By our heuristic, the summation is correctly recognized as the second argument to the + sign. Further, the summand is terminated by the = sign. The expression is now equivalent to recognizing [tex2html_wrap5444], which can be handled by the standard algorithm.
\d{x}
as opposed to dx
, it is
recognized as the closing delimiter; the variable of
integration[+] is inferred. However, this
closing delimiter may not always be available -it may be
encoded ambiguously, as in $\int f dx$
, or the
integral itself may not require a closing
[tex2html_wrap5452], as in [tex2html_wrap5454]. In the former
case, our recognizer treats the juxtaposition
[tex2html_wrap5456] as the integrand. Though this may seem
incorrect, it is in fact exactly what the typeset output
means. In the latter case, the earlier rule (treating the
operand of a big operator to be everything up to the first
operator of lower precedence) applies. Hence, we can
correctly recognize [tex2html_wrap5458].$\frac{\dx}{x}$
, the \dx
does not
end the integrand. This allows us to recognize such integrals
correctly, but we cannot now infer the variable of
integration. There seems to be no clean solution for this
problem. Written mathematical notation relies on the fact
that [tex2html_wrap5462] means [tex2html_wrap5464] and the
integrand is therefore [tex2html_wrap5466].[displaymath5375]