Modality Translation And Assistance Services:
A Challenge For Artificial Intelligence
Gottfried Zimmermann, Ph.D.; Gregg
Trace R&D Center, University of Wisconsin-Madison
2107 Engineering Centers Bldg., 1550 Engineering Dr., Madison, WI 53706 USA
Zimmermann, G., & Vanderheiden, G. (2001, July). Originally published in the Journal of the Austrian Society of Artificial Intelligence (OGAI), 20(2), 26-27.
This paper introduces the "Modality Translation and Assistance Services" concept, a variety of remote services available anywhere and anytime, to enhance the lives of people with and without disabilities. It identifies the research and development challenges that exist in order to implement automated personal services by Artificial Intelligence technologies.
Today, people with functional limitations such as hearing, visual, and cognitive impairments rely on human-assisted services like text transcription and sign language interpretation services, that have to be arranged ahead of time and provided on-site. This dependence can pose severe constraints on people because the presence of other people is required in order to communicate, get access to public information, and live independently. However, recent technological advancements in wide-area, high-bandwidth networks and wireless communication technologies could be utilized to remotely and wirelessly provide personal services like instant text transcription, or sign language interpretation on demand. Moreover, these on-demand services would benefit people without disabilities. The text transcription service could provide a speech input mode for small wireless devices with tiny or no keyboards. A manager could use the same service in order to instantly get accurate minutes of an important business meeting. In fact, we can find many examples in the past, where inventions were created for use by people with disabilities and turned out to improve the quality of life for everybody (e.g. typewriter, telephone).
Modality Translation and Assistance Services Concept
Modality translation and assistance services render information from one specific presentation form (mode) to another, or provide other forms of assistance on demand. Within the wide spectrum of possible services, each service is tailored to a person's communication and assistance needs regarding temporary, or permanent functional limitations. Users can connect to a service by a variety of stationary or mobile devices; examples include handheld computers and cell phones, outfitted with earbuds, buttonhole microphones, and eyeglasses with a built-in monitor. While some of these services can be provided on site in a fully-automated manner today (layer "local automatic services"), others require more advanced implementations and computational resources available through a wide-area network (layer "network enhanced services"). Others still may depend on human assistance for some time (layer "human assisted services"). Although more automated services will be implemented with emerging technologies, the early implementations may not be as mature as needed in some cases. In these situations a "Try Harder" feature could be used to harness more powerful applications on the network, and use human assistance in the automatic translation process when technology fails to be effective in certain environments and for certain problem classes.
Toward an Automated Service Model
Humans can remotely provide all services today. However, automated services implemented in the local and network enhanced layer could facilitate a more cost-effective and scalable service model. Among the services shown, only one (text-to-speech) can be delivered solely by computers today. Others (speech-to-text and international language) are already available as automated services, but still rely on human assistance for verifying results and making corrections if needed. And there are services (speech-to-sign, sign-to-speech, assistance mentoring, language level, and image/video description) for which there are no implementations yet mature enough to be used even in conjunction with human assistance.
Artificial intelligence could provide key technologies to facilitate automated implementations of modality translation and assistance services of tomorrow. Relevant research and development areas include:
- Speaker independent voice recognition, including syntactical and semantic analysis for disambiguation. This could facilitate an accurate and robust text transcription service implementation ("speech-to-text").
- Natural language processing and machine translation. Progress in this area could lead to reliable, automated language translation services ("international language"), and sign language interpretation services ("speech-to-sign", and "sign-to-speech").
- Human modeling and computer-generated animations. Progress in this area could facilitate a speaking interlocutor being "morphed" into a signing avatar for a person who is hearing impaired, and a signing person being shown as talking on the screen of a videophone ("speech-to-sign", and "sign-to-speech").
- Image recognition and image processing. Face recognition and tracking of upper body movements is necessary in order to provide automated sign language interpretation ("sign-to-speech"). Advanced object and pattern recognition, combined with natural language generation could eventually provide verbal descriptions similar to those provided by a human narrator ("image/video description").
- Expert systems with natural language interfaces could provide assistance in making decisions in the lives of cognitively impaired and mentally retarded persons ("assistance mentoring").
- Advanced information extraction and natural language technologies could transform complex sentences into a lower language level or provide a digest of a complex document ("language level translation").
In all these areas, the "Try Harder" feature allows for a smooth transition from automated to human-provided service implementations. A probabilistical model should be part of the service implementations. By keeping track of the probabilities of the provided output, and based on heuristics, a more sophisticated service implementation (machine or human-based) could be automatically consulted for certain parts of the problem, or the whole assignment could be transferred to a superior (or inferior) service implementation, if appropriate.
This work was partly funded by the National Science Foundation (USA) via the Alliance Partnership for Advanced Computational Infrastructure and the National Institute on Disability and Rehabilitation Research (NIDRR), US Department of Education under grants H133E980008, & H133E990006. Opinions expressed are those of the authors and not the funding agencies.
For more information please visit the Trace UD/DA Project page.