Modality Translation on the Grid
Real-Time, On-Demand Access to
Information
Across Modalities
The concept of translation on the Grid addresses people's need for getting information in another form than is provided by default. People that benefit from translation services include:
- people with permanent functional limitations such as hearing, visual and cognitive impairments,
- people with temporary functional limitations like a car driver (cannot use his eyes for reading on a display), a worker in a factory building (cannot hear because of the noisy environment), or a manager in a meeting who needs accurate minutes (cannot type as fast as participants speak),
- people using small (and wireless) Internet devices with restricted input and output capabilities (e.g. handheld computers or cellphones), and
- people getting information in a different language (having insufficient reading or hearing skills in that language).
Since the users' needs vary widely depending on the kind of "limitation" they encounter, there are a variety of translation services addressing these needs. Text-to-speech translation addresses people with visual impairments, but also facilitates eyes-free interaction for everybody. Speech-to-text translation addresses people with hearing impairments as well as people who cannot or do not want to use a keyboard for typing under certain circumstances. Speech-to-sign and sign-to-speech addresses deaf and hard-of-hearing people who use sign language. International language translation addresses the limitations of information exchange in a different language. Language/cognitive level translation addresses cognitively impaired people. Finally, image and video description as text or speech addresses people with visual impairments, but also people who cannot or do not want to use their eyes under certain circumstances.
While some of these services can be provided in a fully automated manner today, others might still need human assistance for the next decade. Although more and more automatic services will be implemented with emerging technologies, the early implementations may not be as mature as needed in some cases. In these cases a "Try Harder" feature could be used that would switch from local automatic services to more sophisticated service implementations on the network, and finally could bring in human intervention to assist the automatic translation process as well as provide a backup solution.
The Trace R&D Center is part of the University of Wisconsin in Madison. We are part of the EOT-PACI partnerships and currently investigate options and promote possible solutions for translation services within the Grid, particularly for deaf and hard-of-hearing users.
Text-to-Speech Translation
![]()
Today's speech technology produces acceptable quality for various applications. Uses either pre-sampled speech (restricted vocabulary) or synthetic speech.
Applications include:
- Voice menus for phone system
- Phone call - one participant with TTY
- Computer user with visual impairment
- Car driver accessing Web-based information (directions, flight schedules)
Speech-to-Text Translation
Today's speech-recognition software only achieves reasonable recognition rates with a restricted vocabulary or with speakers for whom the system was trained for.
There are two technical implementations for human-assisted real-time speech-to-text translation services:
- A trained person using a stenographic keyboard.
- A dedicated speaker (for whom the system is trained) re-voicing everything that has been said.
Applications include:
- Voice menus for phone system
- Face-to-face conversation - one person with hearing impairment
- Business meeting - participant(s) with hearing impairment
- Phone call - one participant with TTY
- Business meeting - manager needs accurate minutes
- Car driver accessing Web-based information (directions, flight schedules)
- A worker in a factory building - cannot hear in a noisy environment
- A web user making an online purchase with a cell phone (or wireless handheld device)
Speech-to-Sign Translation
Sign language interpreters provide instant speech-to-sign translation on site. Given a high-bandwidth wide-area network (Grid), this service can be provided remotely. Today, research is developing automatic real-time speech-to-sign translation using a "Signing Avatar." This multi-disciplinary approach combines several technological areas: speech recognition, linguistics (language representation), machine translation, sign language representation, computer animation.
Applications include:
- Phone call - signing participant
- Collaborative environment - signing participant
- Business meeting - signing participant
Sign-to-Speech Translation
Sign language interpreters provide instant sign-to-speech translation on site. With a local video camera connected to a high-bandwidth wide-area network (Grid), this service could be provided remotely. Inter-disciplinary research in the areas of computer vision, linguistics and machine translation aims to provide this service in a full-automatic manner.
Applications include:
- Phone call - signing participant
- Collaborative environment - signing participant
- Business meeting - signing participant
International Language Translation
Internet translation services already provide instant text-to-text translation from one language to another. This technology still needs to mature. Ongoing research is also exploring automatic speech-to-speech translation for personal and business communications.
Applications include:
- Web page translation
- Email conversation with someone who speaks a different language
- Face-to-face conversation - different spoken languages
- Business meeting - participant with different spoken languages
Language / Cognitive Level Translation
Language/cognitive level translation is the art of transforming words and sentences to express the same meaning in a higher or lower language (abstraction) level. We are still far away from any kind of automation for this service.
Applications include:
- Web page translation for a cognitively impaired Web user
- Facilitate conversation between people of different cognitive levels
Image / Video Description
Today, verbal descriptions of images and videos are created by humans. Once recorded, this description can be replayed by a machine whenever somebody requests it. We are still far away from an automatic image/video description service which would include technologies from the fields of computer vision, pattern recognition, knowledge representation and linguistics.
Applications include:
- Image description for web surfer with visual impairment
- Verbal description of environment for blind passenger
- Car driver accessing Web-based information (directions, exhibition floor plan)
- A manager with a handheld device (or cellphone) accessing a large powerpoint organization chart
Gottfried Zimmermann, Gregg Vanderheiden, Al Gilman Trace R&D Center, University of Wisconsin-Madison [zimmer@trace.wisc.edu, gv@trace.wisc.edu, asgilman@iamdigex.net]