skip to main content

High Contrast | (-) Smaller Font | (+) Larger Font

Ex Parte Comments of the RERC on Telecommunications Access and the MobileASL Project - Public Safety Issues Related to Broadband

Please cite this document as follows:

Harkins, J., Kozma-Spytek, L., Williams, N., Hellström, G., Vanderheiden, G.C., Ladner, R., & Strauss, K.P. (Jan. 5, 2010). Before the Federal Communications Commission, ex parte comments of the Rehabilitation Engineering Research Center on Telecommunications Access and the MobileASL Project in the matter of public safety issues related to broadband communication to and from people with disabilities, GN Docket Nos. 09-47, 09-51, 09-137. Retrieved from http://fjallfoss.fcc.gov/ecfs/document/view?id=7020355300.

I.  Introduction

The Rehabilitation Engineering Research Center on Telecommunication Access  (RERC-TA) and the MobileASL Project of the University of Washington submit these comments as an ex parte in the above-referenced proceeding, to supplement comments previously submitted by the RERC-TA in this proceeding (NBP #14) on December 1, 2009.  The RERC-TA is a joint project of Gallaudet University and the Trace Center of the University of Wisconsin, Madison, funded by the National Institute on Disability and Rehabilitation Research of the U.S. Department of Education.1  The MobileASL Project is a project of the University of Washington's Department of Computer Science, and is funded by the National Science Foundation, Sprint, Nokia, and HTC2.

Investigators and staff of these two centers collaborated to outline for the Federal Communications Commission (FCC or Commission) issues related to two-way video communication, with or without accompanying audio, for communication by people who are deaf and use a signed language, and people who are hard of hearing or adult-deafened and who use video for lipreading.  The authors of this comment have conducted a variety of studies in matters related to the issue at hand including intelligibility of low-bitrate communications for American Sign Language (ASL) production before broadband became available (Harkins); parameters for intelligibility and user ratings in signed languages (Hellström); evaluations of products under different network conditions for comprehensible signed language (Ladner, Hellström, Williams); research and development to improve performance of mobile devices for ASL transmission (Ladner); a test for measuring video quality for signs and lipreading (Hellström, Harkins); study of videophones' interactions with firewalls (Williams); and most recently, evaluation of mobile device audio-video parameters for listening and lipreading by hard of hearing people (Kozma-Spytek).

We have collaborated on these comments to give our current assessment of what is needed for broadband support for conversational ASL and audio/video communication using lipreading as a supplement to voice.  These comments also add an important precautionary warning about the effect of reduced audio quality on the accessibility of emergency voice communications to people who are hard of hearing. 

We understand that all of the technology elements that affect this performance are undergoing rapid change.  Processing speed on mobile devices is growing, screen quality is improving, and greater bandwidth is becoming more available over mobile networks, for example.  Netbooks also introduce a new interface for supporting high quality ASL while mobile. 

As these changes are taking place, the number of individuals using broadband video applications is also increasing, and pressures on networks may affect quality of services in media used for conversation.  For example, carriers are working to minimize the bitrates needed for voice, and these reductions may have adverse effects on people who are hard of hearing.  Without a defined quality of service for audio/video conversation and text conversation, especially in emergencies, users of these modes of communication will be more vulnerable than people who do not have disabilities.

These comments are closely related to the RERC-TA's comments submitted in response to the same public notice, in which we stressed the need for broadband services to allow the use of various disability-related applications, as well as the importance of addressing the interoperability and reliability of real-time text.  In this ex parte filing, we give a more technically specific view of current and future needs regarding video and audio conversational media.  Also, although these comments are primarily in response to the FCC's questions about broadband use for public safety purposes raised in NBP #14, they also relate to broader questions raised by the Commission regarding broadband and people with disabilities, including those raised in NBP #29.

"III. PUBLIC SAFETY COMMUNICATIONS TO AND FROM PERSONS WITH DISABILITIES

  1. We also seek comment on whether, how and what broadband applications can help first responders communicate with people with disabilities. Currently, for example, video remote interpreting allows facilitated person-to-person communications through sign language interpreters who are located off-site of the emergency. Can this application be used in an emergency context? Are there barriers to doing so, and if so, what are those barriers, and what are some possible solutions to overcoming those barriers?"

II.  Considerations for Effective Conversation by People Who are Deaf or Hard of Hearing

A number of factors must be considered for use of American Sign Language (ASL) in conversation over broadband. The main human factors are the sufficient smoothness and completeness in reproduction of the motions in sign language (associated with frame rate); the possibility to perceive the details that convey language content including fingerspelling and eyegaze (associated with frame rate and achieved resolution); and the extent of delay introduced in communication3.  For voice communication users who have hearing loss, it is well established in the literature that the ability to see and lipread the conversational partner greatly improves the intelligibility of spoken communication.  Although some asynchrony between the video and audio can be tolerated, too much asynchrony destroys the advantage of the video component.  Frame rate also affects the ability to use the video component.

Functional parameters

Delay.  End-to-end delay should be kept under 0.5 seconds for suitable turn-taking in a conversation. The part of delay that may come from network transmission may be about 0.25 seconds. Longer delays cause the same types of problems in signed communication as are found in delayed voice telecommunications; people tend to begin to sign/talk at the same time and it is quite annoying and certainly ineffective for emergency communications. 

Frame rate.  In a recent experimental simulation of a cellphone with QCIF resolution, several frame rates and asynchronies were tested:  15 frames per second were adequate for lipreading by hearing device users while they listened to spoken sentences over a high-quality (AMR 12.2 codec).  In fact, 15 frames per second performed as well as 30 frames per second, but lower frame rates had a negative effect on participants' understanding4.

For the signing application, although frame rates as low as 10 frames per second can be used with considerable slowing down of signing and fingerspelling (and possible misunderstanding between the parties), a minimum frame rate of 20 frames per second or more is needed for clear and natural sign communication.  The receiver of signed communications in emergency is typically an interpreter, who in most cases has ASL as a second language.  Thus a minimalistic approach (low frame rate) would lead to additional possibilities for misunderstanding, misinterpretation, and the need to repeat oneself.  There is a continual decrease in usability from 20 frames per second down to around 8 frames per second. As the frame rate decreases users must continually increase their use of unnatural adaptations of the signed language.   Users must slow their signing down below normal rates; and especially fingerspelling, which is a normal part of signing, must be done very slowly and with careful feedback for acknowledgement if the lower range of frame rates is used. (This is analogous to forcing hearing people to talk increasingly slower to be understood and to spell out proper names in order to be understood on a call with increasingly choppy audio. Signing at low frame rates is possible but undesirable – especially in an emergency -- unless there is no alternative.) 

In a small mobile device such as a smartphone (currently with diagonal screen sizes under 4 inches), the computational power for the production of sign must reside within the device.  Experience in Sweden has shown that some mobile videophones (handsets, not laptops) do produce intelligible sign, at 64kbit/s circuit switched video, and the service is used extensively by deaf people there; but the ironic statement often made is that the most-frequent expression in Swedish Sign Language over mobile is "What did you say?"   In other words, at such low bitrates, there is a much greater possibility for low quality that among voice users would be considered unacceptable in cellular communications.

Image quality.  The sharpness and image details need to resolve at least 170 points in the picture horizontally for reasonable usability, while user experience is much better at twice that figure, around 350 points across. As with frame rate, there is a gradual reduction in usability below these figures.

Audio/video asynchrony.  For people who are listening and watching the other party's face to supplement audio (hard of hearing users and hearing people in noisy conditions, for example), audio/video asynchrony affects the intelligibility of the message.  Based on preliminary evidence, at 15 frames per second, it is likely that asynchrony falling between a 100 ms audio lead and a 300 ms audio lag could be tolerated without significant degradation in understanding (compared to fairly ideal conditions of 30 fps and synchronous voice and video).  Note that these recommendations are based on good quality audio signal (AMR 12.2) and QCIF resolution.  The tolerable frame rate and asynchrony would probably be affected by lower quality speech signal and resolution.

Technical requirements

Bandwidth.  Adequate performance for ASL can be achieved with modern video compression methods over an IP connection with bi-directional bandwidth as low as 150 kbit/s given the computational power of an ordinary laptop computer (in 2009).  However, bi-directional 300 kbit/s is needed for the desired higher resolution. 

Current wireless broadband technology has been shown to be sufficient, but 4G networks will be much more effective.  It should not be necessary to specify a low bitrate for these applications, particularly for emergency uses, with emerging networks; however if the requirements for communication by people who are deaf or hard of hearing are not carefully considered, there is a danger that streaming video for conversation will not be considered as an essential aspect of bandwidth management, and access to this population will be denied.

In an emergency situation, the need for good quality in video communication will likely be even higher than in everyday use.  Emergency situations can impair communication conditions due to their stressful nature, making it even harder to adapt to poor transmission quality for video and audio signals.  Emergency situations can also be inherently noisy, with distracting video backgrounds and loud sounds that hinder communication.  Moreover, the need to conserve battery life during an emergency (e.g., to maintain a connection for a long period of time) may lead to trade-offs that at present would lead to lower-quality images. 

In the plans for NG-9-1-1 there are descriptions of three-party call situations with a caller, a call-taker and a video interpreter, all viewing each other in video. Such valuable arrangements would greatly improve clear communication and would avoid having interpreters describe the scene of the emergency – but they require good capacity in both communication terminals as well as the network. The higher of the figures mentioned above, i.e., 300 kbit/s bidirectionally, should be the goal.

It should be emphasized that for signing people, the quality of the video channel is the most important part of the call. Therefore, when differentiated quality of service is introduced in networks, signing users need to be allowed to get sufficient bandwidth and quality of service in the video stream to have the same intelligibility with signing as voice users get on the audio stream.  This will also benefit people who use voice communications and video as supplemental for lipreading and gestures to clarify the message.

Network delay.  Quality of service for conversational video should be established and become part of the national broadband plan.  End-to-end network delay for voice and conversational video should be similar.  Our proposal is .25 second maximum attributable to the network, since video processing on both ends will require some additional time.

Handset variables.  A camera on the same side of the device as the display, and short exposure times of less than 40 ms for the camera even in low light conditions, are other important requirements for mobile devices.  Currently in the U.S., to our knowledge, no carriers offer handsets meeting these criteria.  Despite wireless network build-out, equipment is not yet available to support signing.  We recognize that new handsets enter the market frequently, and we are hopeful that the situation will change soon.  We ask the FCC to monitor the entry of such products into the market.

Mobile devices with built-in chips for video compression can achieve usable results for ASL over video. In addition, smart phones without such built-in chips may perform well when their computational power is increased towards 1 GHz. Technology that could reduce frame rates when there are no quick movements (e.g. when the user is watching and/or listening for example)5; user controllable video output to conserve battery power if necessary6; and other power saving, higher performance measures may be helpful -- if these features are implemented in handsets.

The availability of handsets that function well for video conversation bears monitoring by the FCC under its current obligation to implement Section 255 of the Communications Act and its likely obligations in the future to implement successor legislation that will reach broadband equipment and services.7

Text

During emergency calls as well as everyday calls, there are situations in which it is most reliable to type some information and convey it as text during a call. Different disabilities also call for different combinations of video, real-time text and voice during the call. It is essential that text interoperability is achieved as well as audio and video interoperability so that calls using any or all of these will be completed successfully. The technical standards implemented in the emergency centers for handling emergency calls must be the same as implemented for external communication by user communication systems, interpreter centers, and relay services. The FCC has an important role in forceful harmonization of joint interface standards for calls in these three media, which together constitute "Total Conversation"8

Alignment in these harmonization efforts with the standards that the European Commission requires from the REACH112 project9 is critical, so that global interoperability for emergency situations can be achieved.   This will permit industry to address a single set of requirements internationally.

Audio

Anecdotal reports indicate that a potential consequence of reducing bandwidth for voice communications may be reductions in voice quality that could disproportionately hinder the ability of hard of hearing people to understand speech on mobile telephones or other communication devices whose design is intended to minimize bandwidth.  In an emergency, as in the video condition, the need to hear clearly will be particularly important.  At a minimum level of precaution, audio quality for the general public's emergency communications should not fall below toll-quality levels long in place in the PSTN. 

III.  Conclusion

The National Broadband Plan will need to be clear that the accommodation of signed languages and lipreading must be supported in order to make networks accessible, including during an emergency. 

The frame-rate requirements for ensuring clear communication in signed languages and lipreading are comparable.  Twenty frames per second or more are desirable for conversation and interpretation; below 15 frames per second, both signing and lipreading functions begin to require communicative adaptation by users that is undesirable under stressful conditions of emergency.

Resolution parameters are more difficult to specify given different screen sizes and coding methods that improve the image quality by processing at each end-user's device.  However, based on current technology, bi-directional 300 kbit/s is needed for clear signed communication on mobile devices that have netbook- or laptop-sized screens.  The bar should not be set too low.  If so, it will not accommodate signing or lipreading across a range of device types. 

Quality of service definitions are needed with FCC backing, to ensure that video, text, and audio do not become unusable during emergency. Bandwidth management without consideration of these issues would have a disproportionately negative effect on signing people who are deaf as well as those who are hard of hearing or deaf and who use either voice alone or a combination of voice and video during conversation.

Technology is changing, with faster processing and greater bandwidth holding promise for more diverse uses of video, particularly in mobile devices.  Accompanying these improvements is a considerable increase in the use of networks for video applications.  As this increase occurs, if we are not careful, the conversational applications of video and even audio will become deteriorated in quality to a level that is not usable for emergency communications.  Interoperability and quality of service are critical to clear communications by people with disabilities during emergency. The FCC needs to exercise its role in ensuring that this does not happen, by specifying quality of service levels for all media used by people with disabilities:  video, text, and audio.

Respectfully Submitted.

On behalf of the RERC on Telecommunications Access10:
Judy Harkins, Ph.D., Linda Kozma-Spytek, M.A., C.C.C.-A, and Norman Williams, Gallaudet University
Gunnar Hellstrom, Omnitor AB (Sweden)
Gregg  C. Vanderheiden, Ph.D., Trace Center,
University of Wisconsin, Madison

On behalf of the MobileASL project:
Richard Ladner, Ph.D., University of Washington
Mailing address:
Dr. Judy Harkins
Gallaudet University SLCC 1116
800 Florida Avenue, NE
Washington, DC 20002
202-651-5677 voice
202-559-5622 VP

Of Counsel:
Karen Peltz Strauss
KPS Consulting
3508 Albemarle Street
Washington, DC 20008
202-363-1263


[3] International Telecommunications Union – Technical Standards Sector (ITU-T) Series H:  Supplement 1:  Application profile – Sign language and lipreading real-time conversation using low bit-rate video communication;  Hellstrom, G. (1997).  Quality Measurement on Video Communication. In Norby, K (ed.) Human Factors in Technology'97 (HFT'97), pp. 217-224.  ISBN 82-994236-0-0;  Delvert, J. (1997). Verification of quality requirements on videotelephony for sign language.  In Norby, K. (ed.)  HFT'97 pp. 225-231. ISBN 82-994236-0-0.

[4] Kozma-Spytek, L. & Sauro, J. (in preparation, 2009).  Video parameters for audio-visual speech intelligibility and usability by hearing aid users.

[5] N. Cherniavsky, J. Chon, J. Wobbrock, R.E. Ladner, E. Riskin. Activity Analysis Enabling Real-time Video Communication on Mobile Phonesfor Deaf Users. ACM Symposium on User Interface Software and Technology, UIST 2009, 79-88.

[6] N. Cherniavsky, A. Cavender, R.E. Ladner and E. Riskin. Variable Frame Rate for Low Power Mobile Sign Language Communication. Ninth International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2007), 163-170.

[7] H.R. 3101, The Twenty-first Century Communications and Video Accessibility Act, now pending before the House Energy and Commerce Committee, would extend the scope of Section 255 to services and equipment used over the Internet.

[8] Total conversation service: An audiovisual conversation service providing bidirectional symmetric real-time transfer of motion video, text and voice between users in two or more locations.  ITU-T Recommendation F.703 Multimedia Conversational Services, ITU, 2000

[9] REACH112 (Responding to All Citizens Needing Help) is a European Union project aimed at specifying accessibility requirements for emergency calls in Europe:  www.reach112.eu

[10] The contents of these comments were developed in part with funding from the National Institute on Disability and Rehabilitation Research, U.S. Department of Education, grant number H133E090001.  However, the contents do not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the Federal Government.