Preface

The advent of electronic documents makes information available in more than its visual form —electronic information can now be display-independent. We describe a computing system, AS TE R, that audio formats electronic documents to produce audio documents.

The development of AS TE R was the basis of the author’s dissertation, which was presented to the Faculty of the Graduate School of Cornell University in fulfillment of the Requirements for the Degree of Doctor of Philosophy in 1994. This preface, written three years later, puts the work in perspective with respect to the developments in the world of electronic information and auditory interfaces between 1994 and 1997. The main body of this work remains identical to what was presented in 1994.


  Introduction
  Speech-enabling Applications
  Structured Information And The WWW
  Conclusion

Introduction

AS TE R was motivated by the insight that information presentation needs to take advantage of the specific perceptual modality in use. This typeset manuscript exploits features of visual interaction to convey information effectively; in the same vein, AS TE R introduced the notion of audio formatting to enable rich aural presentations of structured information.

Speech-enabling Applications

The insights gained from developing and using AS TE R have been applied to the more general problem of providing aural access to computer interfaces, starting in late 1994. Computer interfaces encapsulate man-machine dialogue, and once we realize that “The document is the interface”, the technique of synthesizing effective aural presentations starting with the information instead of its visual presentation leads naturally to the speech-enabling approach —see [Ram96aRam96bRam97b]. The speech-enabling approach —a technique that separates computation from the user interface —is described in detail in [Ram97a]. Application designers can implement desired features in the computational component and have different user interfaces expose the resulting functionality in the manner most suited to a given user environment. This leads to the design of high-quality Auditory User Interfaces (AUI) that integrate speech as a first class citizen into the user interface.

Structured Information And The WWW

AS TE R pointed out the advantages to come in a world where documents are first created electronically before being turned into modality-specific presentations such as typeset documents for printing. The work also pointed out the need for such electronic information to be well-structured to enable computation on this information.

The last few years have seen an explosive growth in electronic information on the Internet fueled by the popularity of the WWW. The initial rush to the WWW resulted in publishers putting out rich visual content with a concomitant abuse of document structure as envisioned in AS TE R. As a consequence, content providers on the WWW today face many of the challenges outlined in AS TE R when attempting to create electronic content that can be repurposed for publishing online as well as in traditional print formats. This has also led to a vast amount of Webformation that is becoming increasingly difficult to navigate and categorize —see [Hay96Gib96].

Faced by these challenges, content providers on the WWW are now looking to create richly tagged information using markup systems like XML. As the first such example, mathematical Markup Language (MathML) is an XML application for describing mathematical notation and capturing both its structure and content. The goal of MathML is to enable mathematics to be served, received, and processed on the Web, just as HTML has enabled this functionality for text —URL http://www.w3.org/TR/WD-math/.

Conclusion

As humans, we see, hear, feel and smell. Human interaction is enriched by the concomitant redundancy introduced by multimodal communication. In contrast, computer interfaces until now have relied primarily on visual interaction —today’s interfaces are like the silent movies of the past! As we approach the turn of the century, computers now have the ability to talk, listen and perhaps, even understand. Integrating new modalities like speech into human-computer interaction requires rethinking how information systems are designed in today’s world of visual computing.

Visually rich computing introduced the notion of What You See Is What You Get (WYSIWYG) documents; but by carrying it too far, we risk ending up in a world of “What You See Is All You Have” documents. On the positive side, the exponential growth in electronic information combined with a desire to be able to intelligently process this content and access it whenever, wherever and however the user chooses provides adequately strong reasons to suggest that the world will move away from the present situation of see-only documents.

That a blind person can navigate the Internet just as efficiently and effectively as any sighted person attests to the profound potential of digital documents to improve human communication. Printed documents are fixed snapshots of changing ideas; they limit the means of communication to the paper on which they are stored. But in electronic form, documents can become raw material for computers that can extract, catalogue and rearrange the ideas in them. Used properly, technology can separate the message from the medium so that we can access information wherever, whenever and in whatever form we want.

Archiving information in a structurally rich form will ensure that this vast repository of knowledge can be reused, searched and displayed in ways that best suit individuals’ needs and abilities, using software not yet invented or even imagined.

The coming millenium is likely to prove an exciting one in the world of electronic information.

T. V. Raman
December 6, 1997 Mountain View, CA. URL http://cs.cornell.edu/home/raman