Skip Navigationtrace.wisc.edu HelpSearchBottom of Page

Internet-Based Personal Services on Demand

Gottfried Zimmermann, Ph.D., zimmer@trace.wisc.edu
Gregg Vanderheiden, Ph.D., gv@trace.wisc.edu
Al Gilman, D.Sc., asgilman@iamdigex.net
Trace R&D Center, University of Wisconsin, Madison
August 16, 2001

Zimmermann, G., & Vanderheiden, G. (2002). Internet-Based Personal Services on Demand. In: Winters, J.; Robinson, C.; Simpson, R.; Vanderheiden, G. Emerging and Accessible Telecommunications, Information and Healthcare Technologies - Engineering Challenges in Enabling Universal Access. RESNA Press.

Abstract

Internet-based services tailored to a person's personal communication, information, and assistance needs could improve the quality of their lives. Examples include language translation for business and private communication, text transcription for lawyers and people with hearing impairments, assistance for drivers searching their way through unfamiliar environments, and assistance for people with mental retardation to help them live more independently.

This chapter introduces the "Modality Translation and Assistance Services on Demand" concept, a variety of remote personal services available anywhere and anytime, to enhance the lives of people with and without disabilities.  It identifies the research and development challenges for a telecommunication and information infrastructure aiming to provide personal services on demand.


1 Introduction

We are at the edge of radical technological changes in our environments.  Recent and ongoing advancements in the area of telecommunications and information technologies are facilitating the implementation of the vision of pervasive computing, which will allow a wide variety of devices and services to talk with each other, which in turn opens up new possibilities for personal services.  While the future of machine-to-machine connectivity could make our lives easier, we still struggle with the very essentials of human-to-human conversation and human-machine interaction.  For example, people don't understand each other because they speak different languages, or because one cannot hear and the other person does not understand sign language.  To overcome these restrictions we could harness emerging technologies to provide personal services on demand remotely.  The following fictive example illustrates this.

Sarah is a lawyer working at a legal agency in New York.  Because she wants to get accurate minutes of a current trial, she uses a remote speech-to-text service to get text transcriptions of the court hearings.  The resulting text is automatically stored on her handheld computer. 

Back in her car, she decides to visit a new client who is deaf, Rob, located somewhere in the vicinity of New York.  In order to get verbal driving directions, she speaks the name of the client to the Web-enabled car radio.  When she arrives at Rob's office, he welcomes her by signing in American Sign Language.  Instantly a clear voice from his pocket computer translates the signs into English language.  When Sarah talks to Rob, a signing avatar being projected onto his glasses provides a speech-to-sign translation.

A week later, Sarah is on business travel, sitting in a restaurant in Tokyo.  As she cannot read the Japanese menu card, she directs her handheld computer's camera to it.  Having a scanned image of the menu on the screen of her handheld, she taps on the menu items on her handheld and gets an English translation of each, plus a  symbol telling whether or not she may eat it on her special diet.


2 Personal Services on Demand

The "Modality Translation and Assistance Services on Demand" concept[1] includes a variety of services, like the ones described in the scenario above.  These services are becoming possible as a result of recent technological advancements in wide-area, high-bandwidth networks and wireless communication technologies.  The concept uses telecommunication technologies to allow people to call up services on demand at anytime from anywhere, and on a variety of access devices.  The services are operated on a moment-by-moment basis and the user pays only for a service when they are using it. 

2.1 A Diversity of Personal Services

Modality Translation Services Spectrum[D]

Figure 1: Modality Translation and Assistance Services on Demand Spectrum

Modality translation and assistance services on demand can render information from one specific presentation form (mode) to another, or provide other forms of assistance on demand (see Figure 1).

2.2 Try Harder

One barrier in trying to move forward is the fact that today there is no sophisticated fully-automated and reliable implementation of any services described above.  Instead, we still rely on human assistance to provide a service, or verify and correct the results of a machine-provided service.  For example, speech recognition software installed locally on a wearable computer may suffice for some face-to-face conversations if it is quiet and the people speak carefully and clearly.  However, it may fail when there is too much background noise.  In this case a more sophisticated (and more expensive) service implementation employing noise suppression running on a powerful network computer may be used in order to yield reasonable quality of text transcription.  Again, this implementation might fail when dealing with a strong foreign accent of a speaker.  In this case a human assisted service implementation could meet the user's needs.  For example a ‘re-speaking' method may be used where a person listens to the conversation remotely and re-voices everything distinctly into a high-quality speech recognition system, checking the output for errors.

Although fully automated services may be possible with future technologies, today's implementations are not as mature as needed in most situations.  In these cases a "Try Harder" feature could be used to easily promote the task to more powerful applications (network-enhanced services), or to human assistance in the automatic translation process.  Thus a "try harder" feature would allow users to try the least expensive approach first, but have an easy way to escalate the power (and cost) as needed until the service works for whatever situation they find themselves in[2].

From the perspective of a service provider, the "Try Harder" feature is a convenient method for allowing them to introduce future automated services today.  Automated service implementations that would not be reliable enough of the time to sell, could be offered if they are backed up by more effective (albeit expensive) services (humans or additional resources) until they are mature enough to be used stand-alone.


3 Building an Infrastructure for Personal Services on Demand

Most all of the parts of modality translation and assistance services on demand exist as automatic or human-provided implementations today.  There is, however, no implementation of this network concept overall – and no infrastructure on which to build it.  Building such an infrastructure for diverse Internet-based personal services on demand poses a number of challenges.  We identify challenges for three different areas involved: communication networks (global access to a reliable and secure network), middleware standards (common service framework), and service implementations (automated service model that does not rely on human assistance).

3.1 Challenges for Global Access to a Reliable and Secure Network

The concept is for people to be able to use modality translation and assistance services on demand in virtually all situations.  This means that people can tap into the future network of information and services using a number of different access devices (e.g. computers, wearable computers, public information kiosks, telephones, cell phones, PDAs, car equipment), outfitted with a variety of input and output devices (e.g. glasses with a built-in monitor and camera, earpieces, inconspicuous microphones) from any location (e.g. at work or at home, at school, on the road, on travel, in a tele-collaborative environment, in an emergency room).

As the network evolves to bridge the gap between the person requesting service and the service provider, this network connection has to be as reliable and secure as if the service provider were in the same room.  The network must be able to provide the required bandwidth for different multi-media stream formats (text, audio, and video) with (almost) no time delay, and even be able to flexibly change bandwidth requirements on demand during a session.  Eventually a network featured with a sophisticated "Quality of Service" (QoS) implementation could meet these requirements.

3.2 Challenges for a Common Service Framework

In order to develop a common service framework for modality translation and assistance services on demand a standard meeting the following requirements would be needed (this standard may be part of a broader standard for Internet-based services):

3.3 Challenges for an Automated Service Model

Humans can provide all services mentioned in the Modality Translation and Assistance Services on Demand concept today.  Connecting to human-assisted remote services facilitated by a globally available and reliable network represents a time-efficient and flexible service provision model, as opposed to the traditional model requiring advance arrangement, traveling and on-site presence of the service provider.However, automated service applications implemented in the local and network enhanced layer could facilitate a more cost-effective service model.  Among the services, some (speech recognition, international language translation, and print recognition) are available as automated services today, but often rely on human assistance for verifying results and making corrections if needed (human-assisted layer).  For the other services (sign language, sign language recognition, assistance/mentoring, language simplification, and image/video description) there are no implementations yet mature enough to be used even in conjunction with human assistance.For each of these services we identify research issues and challenges, mainly in the area of artificial intelligence and natural language processing that need to be addressed in order to develop highly sophisticated, automated implementations for modality translation and assistance services on demand:

For all these services, the "Try Harder" feature allows for a smooth transition from inferior to superior automated implementations, and from automated to human-provided service implementations.  Beyond the challenges of implementing the individual services in an automated manner, there is the overall challenge of an automatic "Try Harder" feature.  This feature could be facilitated by the services implementing a probabilistical model keeping track of the probabilities of the provided output.  Then, based on heuristics, a more sophisticated service implementation (machine or human-based) could be automatically consulted for certain parts of the problem, or the whole assignment could be transferred to a superior (or inferior) service implementation, if appropriate.


4 Related Work

Applications and services that provide personal translation and assistance services similar to those described in this paper, are envisioned by researchers, developers, and service providers.  Some interesting research and development results in this area are mentioned below.  Once a service infrastructure is built up, these service implementations (or their successors) could be integrated into the concert of personal services on demand. 

The company Vcom3D produces signing avatar software[3], that uses scripting technology to convey Web content by signing (see Figure 2).  The TEAM system[4] uses a machine translation approach to translate English sentences to American Sign Language rendered by an avatar application.  The ViSiCAST project[5] features a signing avatar that translates standardized content of a weather forecast Web page into several European sign languages.  The iCommunicator[6] is a computer-based system providing speech to text translation, and speech to sign language translation via digital movie images.  The SignTel interpreter[7] is a similar system.  The Media Access Group[8] at Boston's WGBH provides text captioning and descriptive video services for the media industry.  Ultratec in Madison, Wisconsin, provides a speech-to-text service called "FASTRAN" for telephone users with special premises[9].  The Trace R&D Center has demonstrated the application of speech-to-text translation to tele-collaborative environments at the SuperComputing conference 2001 in Denver, Colorado[10].

Vcom3D Signing Avatar Screenshot[D]

Figure 2: Signing Avatar 1.0 from Vcom3D

In the area of potential architectures, Foster and Kesselman (1999)[11]provide input for several aspects of the Modality Translation and Assistance Services on Demand concept, including communication and security issues.  Postel and Touch (1999)[12]mention the convergence of media, of telephony, cable television, radio, and the Internet as a possible driving factor for a future network infrastructure.  Foster (2000)[13] developed a set of requirements for an "Integrated Grid Architecture" and presents a candidate structure for this architecture. 

Another potential architecture model is the eCommerce driven Universal Description, Discovery and Integration (UDDI) standard[14] which aims to connect buyers, suppliers, marketplaces, and service providers within a global, open electronic business framework.


5 Conclusion

The "Modality Translation and Assistance Services on Demand" concept has the potential to improve the quality of life for everybody regarding human-to-human communication, access to information, and independent living.  Moreover, it seamlessly integrates the needs for people with disabilities into a more general network based service delivery model.

We have identified some challenges for building an infrastructure for Internet-based personal services on demand.  In order to reach this goal, we rely on advancements in three areas.  First, computer networks are needed that can make these services available to anybody from any location, on a reliable basis.  Second, Web-based middleware standards are needed to provide a service framework that accommodates the structural needs of a common service infrastructure.  And finally, advances in the area of Artificial Intelligence and natural language processing are needed in order to create fully automated implementations of these services, thus lessening today's dependence on expensive human assistance.


6 References

[1] Zimmermann, G., & Vanderheiden, G. (2001, March).  Translation on Demand Anytime and AnywhereCSUN's Sixteenth Annual International Conference, Los Angeles, California, March 19-24, 2001.  Retrieved Aug. 14, 2001, from the World Wide Web.

[2] Vanderheiden, G. (in press). Telecommunications – Accessibility and Future Directions. In: Abascal, & Nicolle, (eds.). Inclusive Guidelines for HCI. Taylor & Francis Ltd., in press.

[3] Wideman, C.; & Popson, S. (2001). Sign Language Assistive Technology Offers Access to Digital Media.  Proceedings of CSUN's Sixteenth Annual International Conference on "Technology And Persons With Disabilities", Los Angeles, CA, March 19 - 24, 2001.  Retrieved Aug. 14, 2001, from the World Wide Web.

[4] Zhao, L.; Kipper, K.; Schuler, W.; & Badler, N. (2000). A Machine Translation System from English to American Sign Language.  Proceedings of AMTA-2000: Envisioning Machine Translation in the Information Future, Mexico, 2000.

[5] Verlinden, M.; Tijsseling, C.; & Frowein, H. (2001).  A Signing Avatar on the WWW.  International Gesture Workshop 2001, City University, London, April 2001.  Retrieved Aug. 14, 2001, from the World Wide Web.

[6] Teach The Deaf, Interactive Solutions, Florida. 

[7] SignTel Inc., Connecticut. 

[8] Media Access Group, WGBH Educational Foundation, Boston, Massachusetts.

[9] Ultratec (2000).  Some ultratec.com.  Service announcement Aug. 2000.  Retrieved Aug. 14, 2001, from the World Wide Web.

[10] Gores, N. (2002).  Building an Accessible Access Grid.  NCSA Access Online, Jan. 15, 2002.  Retrieved Jan. 25, 2002, from the World Wide Web.

[11] Foster, I., & Kesselman, C. (1999).  The Globus Toolkit.  In: Foster, I., & Kesselman, C. (editors). The Grid – Blueprint for a New Computing Infrastructure (chapter 11, pp. 259-278).  San Francisco: Morgan Kaufmann.

[12] Postel, J., & Touch, J. (1999).  Network Infrastructure.  In: Foster, I., & Kesselman, C. (editors). The Grid – Blueprint for a New Computing Infrastructure (chapter 21, pp. 533-566). San Francisco: Morgan Kaufmann.

[13] Foster, I. (2000). Building the Grid: An Integrated Services and Toolkit Architecture for Next Generation Networked Applications (Draft).  Retrieved Aug. 14, 2001, from the World Wide Web.

[14] Universal Description, Discovery and Integration (UDDI)


Acknowledgments

This work was partly funded by the National Science Foundation (USA) via the Alliance Partnership for Advanced Computational Infrastructure within the Education, Outreach and Training (EOT) program; and the National Institute on Disability and Rehabilitation Research (NIDRR), US Department of Education under grants H133E980008, & H133E990006.  Opinions expressed are those of the authors and not the funding agencies.

For more information see Modality Translation Services Program.