[Next] [Up] [Previous]
Next: Recognizing document logical Up: Translating between different Previous: Producing abstract syntax

Converting to the least common denominator

Given documents marked up in [tex2html_wrap5908] different markup languages, an alternative solution is to convert all of them to a form that is the least common denominator of the various document encodings. This can be done by converting the documents either to plain ASCII or to a display-specific format, such as Postscript. Both these alternatives have shortcomings as outlined below.

Converting to ASCII loses layout structure. Since the only thing that cues logical structure in a formatted document is layout, this form of conversion loses information.

An alternative solution is adopted by systems like the Adobe Acrobat. Page Description Format (PDF), a portable form of Postscript, is used by the Adobe Acrobat as a common currency between different computing platforms. The encoded document can be displayed with its original layout on disparate computing platforms without using the software used to produce the original document. This solution does allow users to exchange documents without losing any layout information. However, it is only one step better than exchanging printed paper: exchanging PDF files is like exchanging electronic paper! For example, the information present in the document cannot be manipulated electronically. This also means that the information and its inherent structure can be accessed in only one way -by a human looking at the information. The principal advantage of having information online -the ability to process it- is lost. In addition, it has the serious disadvantage of making electronic information inaccessible to persons with special needs.



TV Raman
Thu Mar 9 20:10:41 EST 1995