Screen-readers have helped open up the world of computing to visually impaired users[+]. However, the spoken interface they provide leaves a lot to be desired.
The primary shortcoming with such interfaces is their inability to convey the structure present in visually displayed information. Since the screen-reading application has only the contents of the visual display to examine, it conveys little or no contextual information about what is being displayed. Put another way:
A Screen-reader speaks what is on the screen without conveying why it is there.
As a consequence, accessing applications that display highly structured output in a visually pleasing manner with screen-readers is cumbersome.Here is a simple example to illustrate the above statement. A typical calendar display is made up of a table showing the days of the week. This information is visually laid out to allow the eye to quickly see what day a particular date of the month falls on. Thus, given the display shown in Fig. 1, it is easy to answer the question ``What day is it today?''.
Jan 1995 S M T W Th F Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31Figure 1: A Typical Calendar Application
When this same display is accessed with a screen-reader, the user hears the entire contents of the calendar spoken aloud. This results in the following set of meaningless utterances:
pipe pipe 1 pipe 2 pipe 3 pipe 4 pipe 5 pipe 6 pipe 7 pipe pipe : : : pipe pipe 29 pipe 30 pipe 31 pipe pipe pipe pipe pipe pipe
Alternatively, the characters under the application cursor can be spoken. In the case of Fig. 1, the listener would hear the system say ``one''. To answer the question ``What day is it today?'' the user has to first build a mental representation of the visual display, and then navigate around the screen, examining the contents that appear in the same screen column as the 1 in order to infer the fact that the date is Sunday, January 1, 1995.
Screen-readers for both character-cell and graphical displays suffer from this shortcoming. This is a consequence of trying to read the screen instead of providing true spoken feedback. The rest of this paper describes Emacspeak, an interface that treats speech as a first-class output medium. Screen-readers speak the screen contents after the application has displayed its results; Emacspeak integrates spoken feedback into the application itself. This tight integration between the spoken output and the user application enables Emacspeak to provide rich, context-sensitive spoken feedback. As a case in point, when using the calendar application, the user hears the current date as Sunday, January 1, 1995. For related work in integrating speech as a first-class I/O medium into general user applications, see [YLM95].
We conclude this introduction by pointing out that visual layout plays an important role in cuing the reader to information structure. Such visual cues reduce cognitive load by allowing the perceptual system to perceive the inherent structure present in the information, thereby freeing the cognitive system to process the information. Spoken feedback produced from the visual layout proves difficult to understand because many of the structural cues are lost; to make things worse, other structural cues turn into noise (the ``pipe pipe ...'' above is a case in point). This results in the listener having to spend a large number of cognitive cycles in trying to parse the spoken utterance, making understanding the information considerably harder. Speaking the information in an aurally pleasing manner alleviates this burden, leading to better aural comprehension.