Accessible Document Encodings

Using the screen-reading paradigm to provide access to electronic information has resulted in a common misconception that an accessible document is an ASCII document. This is not true! Though a large number of ASCII documents (electronic documents that contain plain text with no control codes) are accessible using screen-reading software, ASCII documents that use implicit visual layout in the form of spacing and vertical alignment are inaccessible. Thus, an ASCII display of a fraction or table is inaccessible for the reasons pointed out in the previous section.

This thesis has focused on the issue of presenting structured information orally with a view to conveying the underlying structure using audio layout. This kind of oral presentation requires full access to both the information as well as its underlying structure. Based on our experience, we define accessibility of a document encoding as follows:

  1. Amount of structural information captured by the encoding.
  2. The extent to which this structural information is available for processing by other applications.
  3. The availability of the appropriate software needed to process this structure.

Thus, document encodings such as Postscript and PDF2 are inaccessible because extracting document structure from purely visual layout is hard. Similarly, the internal format used by WYSIWYG (What you see is What you get) systems is inaccessible, since the assumed mode of presentation is visual.

Document encodings using markup languages such as (LA)TEX are better suited for oral access to information, because they encode the information in a layout independent manner. As pointed out in the chapter on recognition (see Chapter 2), extracting high-level structure from the (LA)TEX source, though possible, is fairly involved.

The advantage of grammar-based systems like (LA)TEX is that they encapsulate the information in a manner that allows alternative processing. This advantage is fully realized by Standard Generalized Markup Language (SGML), which provides the best possible choice for accessible encodings. Note, however, that a document does not become accessible simply by being encoded in SGML. The accessibility of an SGML document is determined by the Document Type Definition (DTD) to which it adheres. Thus, a DTD that does not capture any high-level structure leads to inaccessible SGML documents.