T. V. RAMANEnvisioning Speech
T. V. Raman wants to show me what he has been building on the nights and weekends when he is not working as a senior computer scientist at Adobe Systems. So I have come down to his apartment in Mountain View, Calif., to watch him play. As we sit in his spartan living room, decorated only with a NordicTrack, a partially solved five-by-five Rubik's Cube (adorned with Braille stickers) and a single framed poster of wolves, Raman powers up his laptop. The device comes to life with what sounds to my ears to be a string of alien gibberish, like a compact disc on fast forward. Raman smiles: to the blind engineer, that is the sweet sound of connection. "I've gotten used to the thing talking very, very fast. It keeps me efficient," he chuckles, before slowing the speech rate down by about half so that I can follow along. Gibberish turns to stilted, robotic English-a voice familiar to me as that of Stephen W. Hawking, the renowned physicist, who uses the same type of synthesizer.
Feeling around the cushions of his couch for a telephone cord, Raman plugs in his modem and dials up his workstation at Adobe. As his hands fly over the keys, the movements of this 31-year-old immigrant from Pune, India, remind me of a virtuoso pianist. Each stroke elicits a distinct sound as his synthesizer intones a cacophony of letters, words, chords. Cowbells jangle when the computer has a question or a suggestion for him. As his World Wide Web browser loads, Bach's toccata and fugue plays. Within a minute or two, Raman is scanning the latest headlines from CNN and checking out hot stocks at the Wall Street Journal. His expression betrays a giddy adoration for this technology.
Raman can be forgiven a touch of nerdy technophilia, for without his work, it would be tedious if not impossible for the blind to do these things with a computer. Software he designed enables the sightless to read mathematical and scientific papers, to surf the Internet and to write their own programs almost as efficiently as the sighted do. Raman's ideas may soon find their place in the mainstream as well: his research for Digital Equipment and Adobe is wending its way toward the marketplace.
The path from Pune to Mountain View could not have been easy for Raman, but he waves off suggestions that he has overcome any great handicap. Glaucoma dimmed Raman's sight gradually during childhood. "By age 14, I couldn't see anything," he states without any hint of bitterness. The baby in a middle-class family of six, Raman-whose initials stand, respectively, for his hometown and his father's name-showed an early affinity for mathematics. He majored in the subject at the University of Pune, then applied for a master's program in math and computer science at the Indian Institute of Technology-the first blind student ever to do so. "I convinced the dean to allow students to satisfy their national social service requirement by reading the screen for me," Raman recounts. "I had to line up 13 students each semester."
At Cornell University, where he did his doctoral work, Raman got his first speech synthesizer, along with the most advanced screen-reading software then available: it simply spoke the text on display. "Imagine working with a one-line, 40-character display, instead of a nice, big 60-line monitor. That's what you're fighting against when you use a speech interface," Raman says animatedly. Worse than the tedium, the device rendered many of the mathematics texts Raman needed to read unintelligible. "Most of these papers were written in LaTeX [a notation used to typeset texts containing equations or symbols]. The program would come upon the code for an equation and start saying, 'Backslash backslash x caret something'-it was ridiculous," he laughs. "So I decided to write a nice weekend hack that would read LaTeX to me sensibly."
Mukkai S. Krishnamoorthy, a computer science professor at Rensselaer Polytechnic Institute, was taking his sabbatical at Cornell at the time. "Raman was working on a very ambitious thesis topic," he recalls. "He wanted to design a robotic guide dog that could navigate using the Global Positioning System. But it was going slowly, so I suggested he focus instead on improving computers' reading abilities."
Raman followed that advice as well as a clever approach suggested by David Gries: he constructed a high-level programming language that can control the way certain phrases and mathematical expressions are spoken by the synthesizer. Then he added a system that can take a file formatted in LaTeX, analyze it and render it aurally. Raman designed his program to translate the visual structure and style of the text into intuitive audio cues. Italicized passages can be read louder than normal. Chapter headings might be read by a baritone voice, footnotes by a soprano. A short tone could precede each item in a bulleted list.
Raman named the system AsTeR, ostensibly for "Audio System for Technical Readings," but actually after the frisky black Labrador that has guided him for six years. AsTeR's power lies in its ability to browse quickly through complicated material. Whereas one can skim through a book, find a page of interest and take in tables, fractions and integrals at a glance, audio is frustratingly linear. Yet it need not be one-dimensional. "If you have CNN on in the other room, you can always tell when the financial news is on-they play a distinctive noise in the background," Raman points out. AsTeR uses similar techniques to help listeners keep track of where they are. It also allows the hearer to interrupt its monologue and skip to another section.
Complex mathematical expressions can sound ambiguous or incomprehensibly long even when read aloud by experts. AsTeR relies on aural tricks to do the job. To speak the program uses successively higher-pitched voices, rather than verbose descriptions, to indicate the nested exponentials. When reading tables or matrices, it can pan the sound left and right to convey the position of each value. Most important, it can create all its audio cues from unembellished LaTeX documents written by authors who have never heard of AsTeR, and readers can customize AsTeR's cues. Fittingly, Recording for the Blind and Dyslexic in Princeton, N.J., used AsTeR to read Raman's thesis onto tape, the organization's first fully synthesized recording.
Although AsTeR helped Raman read and write technical papers, it did nothing to simplify the more pedestrian functions of his computer. The need for a better speech interface became even more pressing when Raman left Cornell to join Digital Equipment's Cambridge Research Lab. "A colleague, Dave Wecker, prodded me to apply the principles of AsTeR to a more general computer interface," Raman recounts. "But the challenge is that even though your program may know what is on the screen, that screen is not a simple paragraph of text but a complicated display with title bars and menu bars and scroll bars and messages popping up and cursors bouncing around. The amount of information is huge.
"I figured I'd build something quickly on top of Emacs [a text-based UNIX interface] to run on my laptop. After a few days, I had a first version that did almost nothing: it would just read the line beneath the cursor. But then I built an extension for the calendar, and I finally figured out that this approach could improve my life a hell of a lot."
To demonstrate why, Raman grabs his laptop. Aster (the dog) plops her head in my lap, and Raman scratches her back as he fires up the calendar. "Now," he says, moving the cursor to the beginning of a week, "this is how a screen reader interprets the calendar." The voice begins reading the numbers in the row of boxes, "Eight, nine, ten, eleven...." Raman cuts it off, giggling at its inanity. "Useless. A more natural way to convey the same information is like this." Another keystroke, and the computer intones the cursor's position as he has taught it to: "Wednesday, May 1, 1996."
"Now the text of what it said does not appear on the screen," Raman explains. "In fact, the program did not refer to the screen at all." Raman has exploited a way to modify the behavior of programs without changing the programs themselves. "Emacs allows you to 'advise' a function to run extra code after it is finished. So I simply advise the calendar to speak the complete date whenever I reposition the cursor. The great thing is," he says, exploding with enthusiasm, "the guy who wrote the calendar function has no idea I've done this, and when he releases a new version of the software, the speech enhancements will still work. It's a perfect parasite."
Bit by bit, Raman added speaking capabilities to other Emacs programs, such as the tools he uses to write and test software. "A lot of people in the lab, including myself, started using tools that he was evangelizing," Wecker reports. "They were necessary for him, but they were improvements for us, because they allow you to collapse subroutines, even whole programs into outline form." Raman adapted a public-domain browser for the Web to use his interface and distributes Emacspeak free on the Internet.
Meanwhile others are weaving new products from threads of his invention. Krishnamoorthy built a prototype Web service at Rensselaer that can run AsTeR for those who are unable to. "You simply paste the document to be read into a form, then the server processes it and sends you back a file for your speech synthesizer," the professor explains. Unfortunately, the project has been halted for lack of funding.
Since 1994 the Science Accessibility Project, led by John Gardner of Oregon State University, has continued to develop AsTeR. "Raman really pioneered this area of audio formatting," says Gardner, who is also blind. "The [audio-enhanced] Web browser is so much better than anything else I could possibly use. But there is still an awful lot to be done." Gardner's group just released a graphing calculator for the blind; he says the next version will use audio formatting. "If we can develop audio formatting for math and science, we can do it for bloody well anything," Gardner says.
Whether that includes mainstream applications remains to be seen. Raman is not leaving the matter to chance. He is working with Adobe to incorporate audio formatting into its popular portable document format, and he is a frequent speaker at conferences on the future of computer interfaces. On the Internet, he seems omnipresent, adding to his inventions, pushing the boundaries of technology and persuasively arguing for standards that will ensure that the flood of information raises all boats.
-W. Wayt Gibbs in San Francisco