Modality Translation Services on Demand
Making the World More Accessible For All
Gottfried Zimmermann, Ph.D., zimmer@trace.wisc.edu
Gregg Vanderheiden, Ph.D., gv@trace.wisc.edu
Trace R&D Center, 2107 Engineering Centers Bldg., 1550 Engineering Dr.
Madison, WI 53706
Abstract
Two things must occur for a person to use information. The information must be accessible and it must be presented to the person in an understandable way, or mode. This paper introduces the "Modality Translation Services" concept, which comprises a set of remote services to provide instant translation from one presentation mode to another, available anywhere at anytime. This paper will explain these services, potential applications, and show how this concept could benefit people with disabilities and people who are not disabled but experience functional limitations.
Introduction
Thomas Jefferson's words, "Information is the currency of democracy," pertain to today's information society more than ever. Exclusion from information can keep a person from fully participating in society. The problem is not a shortage of information. Indeed, we may often experience information overload. The question is, how can we access information in the way we need (in the appropriate "currency") to be able to use it? For example, it would be inappropriate for a person to visually read e-mail while driving a car because the eyes are busy watching the road and traffic. However, the driver could use a "text-to-speech service" to voice the e-mail messages. Another example is a blind person participating in a business meeting where a diagram is being discussed. Here, an "image description service" could provide a verbal translation for the visual diagram.
Concept
The "Modality Translation Services" concept is a variety of remote services available anywhere, anytime [1]. These services are becoming possible as a result of recent technological advancements in wide-area, high-bandwidth networks and wireless communication technologies.
Service Spectrum
Modality translation services render information from one specific presentation form (mode) to another. Within the wide spectrum of possible services, each service is tailored to a person's communication needs regarding temporary, or permanent functional limitations (see figure 1).
- Text-to-Speech facilitates eyes-free interaction and requires no reading skills for the user. A human reader, or a speech synthesizer, could deliver this service.
- Speech-to-Text facilitates real-time ears-free interaction without requiring typing or writing by the user. A human steno-typist, or automatic or human-assisted voice recognition technology, could deliver this service.
- Speech- to-Sign facilitates communication between a person speaking and a person who is deaf signing. A human sign interpreter, or a signing avatar (computer-animated character on a display), could deliver this service.
- Sign-to-Speech facilitates real-time communication between a deaf person who signs and a non-signing (hearing) person. A human sign interpreter, or an image and sign recognition system, could deliver this service.
- International Language translates text, or real-time speech, from one language to another. A human language interpreter, or a machine translation system, could deliver this service.
Figure 1: Modality Translation Service Spectrum
- Language Level simplifies text, or real-time speech, presented in a complex language (cognitive) expertise level. A human interpreter, or an automatic information extraction system, could deliver this service.
- Image/Video Description provides speech or text translation from a visual image or video. A human service provider, or an automated system with computer vision, text generation and speech synthesizing capabilities could deliver this service.
Try Harder Feature
While some of these services can be provided in a fully-automated manner today (e.g. text-to-speech synthesizers for the text to speech service), others may need human assistance for some time (e.g. speech to text, language level translation, and image/video description service). Although more automated services will be implemented with emerging technologies, the early implementations may not be as mature as needed in some cases. In these situations a "Try Harder" feature could be used to harness more powerful applications (network advanced services), and use human assistance in the automatic translation process when technology fails to be effective in certain environments and for certain materials [2].
Service Access Devices
To use these on-demand translation services, a person needs to have a device that connects remotely to a global, high-bandwidth network and renders information on a display, or through other output units. Although any kind of computer system can be used as an access device, the small, wireless devices bring the real "anywhere at anytime" feature to the user. Examples include handheld computers, cell phones, etc., outfitted with earbuds, buttonhole microphones, or eyeglasses with a built-in monitor.
Who are the users?
We identify four user groups that could benefit from the "Modality Translation Services" concept, differing only in the kind of functional limitation they encounter:
- People with permanent functional limitations such as hearing, visual, and cognitive impairments;
- People with temporary functional limitations like a car driver (cannot use his eyes for reading on a display), a worker in a factory building (cannot hear because of the noisy environment), or a manager in a meeting who needs accurate minutes (cannot type as fast as participants speak);
- People using small (and wireless) Internet devices with restricted input and output capabilities (e.g. handheld computers, or cell phones);
- People facing information given in a different language (having insufficient reading or hearing skills in that language).
Applications
Many of these services are already implemented in a human-assisted, semi-automatic or full-automatic manner. Examples of the speech-to-text service include Ultratec's Instant Captioning technology [3] and the Classroom Captioner from Personal Captioning Systems [4]; for the speech-to-sign service the Signing Avatar from VCom3D [5]; and for the international language service the AltaVista Babel Fish translation service powered by SYSTRAN [6].
In order to make these services available to a broad user basis they should be embedded in a globally available telecommunication network. As part of the Partnership for Advanced Computational Infrastructure (PACI) [7] the Trace Center is currently investigating options and promoting feasible solutions for modality translation services on the Grid, and other next-generation networks, services and computational resources [8].
References
[1] Zimmermann, Gottfried & Vanderheiden, Gregg (2001). Translation on Demand Anytime and Anywhere. CSUN's 16th International Conference, March 19 - 24, 2001, Los Angeles, CA.
[2] Vanderheiden, Gregg (in press). Telecommunications Accessibility and Future Directions. In: Abascal, & Nicolle, (eds.). Inclusive Guidelines for HCI. Taylor & Francis Ltd., in press.
[3] Ultratec Inc., Instant Captioning Technology.
[4] Personal Captioning Systems
[5] VCom3D, Inc. Signing Avatar
[6] AltaVista Babel Fish translation service
[7] Partnership for Advanced Computational Infrastructure (PACI), National Science Foundation (NSF)
Acknowledgements
This paper was partly funded by the National Science Foundation (NSF) in the context of the Universal Design/Disability Access Program (UD/DA) [8].
Gottfried Zimmermann, Ph.D., zimmer@trace.wisc.edu
Trace R&D Center, 2107 Engineering Centers Bldg., 1550 Engineering Dr.
Madison, WI 53706
