Auditory User Interfaces --List Of Figures
- Figure 1.1
- Computing applications typically consist of obtaining
user input, computing on this information and finally
displaying the results. The first and third phase in this
process constitute the user interface. As can be seen, it is
possible to separate the user interface from the
computational phase.
- Figure 1.2
- Calendars are displayed visually using a two-dimensional
layout that makes it easy to see the underlying structure.
The calendar display consists of a set of characters on the
screen; but the meaning of this display is as much in its
visual layout as in the characters themselves. Merely
speaking the text fails to convey meaning. We can see that
January-1, 2000 is a Saturday; this information is missing
when the visual display is spoken.
- Figure 2.1
- Sub-components of recorded prompts used by an IVR system
at a bank. Different prompts can be generated by
concatenating appropriate components.
- Figure 2.2
- Phonemes in American English. The various vowels and
consonants making up standard American English are shown
using a two-letter notation. Each phoneme is shown along with
a word containing that phoneme.
- Figure 2.3
- Textual description of a nested exponent. Notice that
when reading the prose making up this description, it is very
difficult to perceive the underlying structure of the
mathematical expression.
- Figure 2.4
- A call management system using word spotting. Users can
express the same command in several ways. The recognition
system looks for key phrases that determine the user command,
thereby allowing for a flexible system.
- Figure 2.5
- Coarticulatory effects in continuous speech.
Coarticulatory effects (or the lack there of) are often a
problem when trying to synthesize natural sounding speech.
Not surprisingly, the presence of these same effects in human
speech make the computer's task of recognizing continuous
speech even harder.
- Figure 2.6
- Using spatial audio to encode information about incoming
email. Auditory cues indicate the arrival of new mail. These
auditory cues encode additional information such as urgency
of the message using spatial audio.
- Figure 3.1
- Visual realization of conversational gestures ---the
building blocks for dialogues. User interface design tries to
bridge the impedance mismatch in man-machine communication by
inventing a basic set of conversational gestures that can be
effectively generated and interpreted by both man and
machine.
- Figure 4.1
- The Emacspeak desktop consists of a set of active buffer
objects. This display shows a subset of currently active
buffers on my desktop.
- Figure 4.2
- A sample directory listing. The visual interface exploits
vertical alignment to implicitly encode the meaning of each
field in the listing.
- Figure 4.3
- A listing of running processes. The task manager helps in
tracking system resources. Processes can be killed or
suspended from the task manager.
- Figure 4.4
- Commands available while searching. A set of highly
context-specific conversational gestures.
- Figure 4.5
- Outline view of this section. It can be used to move
quickly to different logical components of the document.
- Figure 4.6
- Result of folding the lexical analyzer in AsTeR . This is
a document consisting of over $2,000$ lines. Folding helps in
organizing the code, obtaining quick overviews, as well as in
efficient navigation.
- Figure 4.7
- Sample collection of dynamic macros available when
editing C-source. Standard C-constructs can be generated with
a few gestures.
- Figure 4.8
- A sample C-program. It can be created with a few gestures
when using dynamic macros.
- Figure 4.9
- A sample HTML page. Template-based authoring makes
creating such documents easy.
- Figure 4.10
- Visual display of a structured data record. The data
record is visually formatted to display each field name along
with its value.
- Figure 4.11
- An expense report. Semantics of the various fields in
each record is implicitly encoded in the visual layout.
- Figure 4.12
- Tracking an investment portfolio. Modifying entries can
cause complex changes to the rest of the document.
- Figure 4.13
- A train schedule. We typically look for the information
we want, rather than reading the entire timetable.
- Figure 4.14
- Commands in table browsing mode. The interface enables
the user to locate the desired item of information without
having to read the entire table.
- Figure 4.15
- A well-formatted display of the message headers presents
a succinct overview of an email message in the visual
interface. Speaking this visual display does not produce a
pleasant spoken interface ---the spoken summary needs to be
composed directly from the underlying information making up
the visual display.
- Figure 4.16
- Newsgroups with unread articles are displayed in a
*Group* buffer. This buffer provides special
commands for operating on newsgroups. The visual interface
shows the name of the group preceded by the number of unread
articles.
- Figure 4.17
- Unread articles are displayed in buffer *Group
Summary* . This buffer is augmented with special
commands for reading and responding to news postings. The
visually formatted output succinctly conveys article
attributes such as author and subject.
- Figure 4.18
- More than one opening delimiter can appear on a line.
When typing the closing delimiter, Emacspeak speaks the line
containing the matching delimiter. The spoken feedback is
designed to accurately indicate which of the several open
delimiters is being matched.
- Figure 4.19
- An example of comparing different versions of a file.
Visual layout exploits changes in fonts to set apart the two
versions. The reader's attention is drawn to specific
differences by visual highlighting ---here, specific
differences are shown in a bold font. Visual interaction
relies on the eye's ability to quickly navigate a
two-dimensional display. Directly speaking such displays is
both tedious and unproductive.
- Figure 4.20
- Browsing the Java Development Kit (JDK 1.1) using a rich
visual interface. Understanding large object oriented systems
requires rich browsing tools. Emacspeak speech-enables a
powerful object oriented browser to provide a pleasant
software development environment.
- Figure 4.21
- Emacspeak is implemented as a series of modular layers.
Low-level layers provide device-specific interfaces. Core
services are implemented on a device-independent layer.
Application-specific extensions rely on these core
services.
- Figure 4.22
- Advice is a powerful technique for extending
functionality of pre-existing functions without modifying
their source code. Here, we show the calling sequence for a
function $f$ that has before, around, and after advice
defined.
- Figure 4.23
- Example of advising a built-in Emacs command to speak.
Here, command next-line is speech-enabled via an
after advice that causes the current line to be spoken after
every user invocation of this command.
- Figure 5.1
- HTML pages on the WWW of the 1990's abound in
presentational markup. What does red text on a monochrome
display mean? What does it mean to (er) blink aurally?
- Figure 5.2
- A sample aural style sheet fragment for producing audio
formatted Webformation. Audio formatting conveys document
structure implicitly in the aural rendering, allowing the
listener to focus on the information content.
- Figure 5.3
- The HTML-3.2 specification fails to separate the
underlying conversational gesture from its visual realization
even more dramatically than GUI toolkits. In this example, it
is impossible to decipher from the markup that the current
dialogue expects the user to enter a name and age ---in
HTML-3.2, there is no association between an edit field and
its label.
- Figure 5.4
- The AltaVista main page. This page presents a search
dialogue using a visual interface. Emacspeak presents a
speech-enabled version of this dialogue that is derived from
the underlying HTML.
Book Overview
Email: raman@adobe.com
Last modified: Tue Aug 19 17:09:15 1997