Middleware and the eSCaped Web
Network middleware to adapt web content for mobile devices is a very dynamic area. New players are emerging on all sides. There has been work in this area for the purpose of disability access for several years, and the W3C Note on guidelines for mobile-usable documents relies heavily on the Web Content Accessibility Guidelines 1.0 Recommendation developed for disability access purposes. Here we consider how various middleware strategies perform against two usability criteria: fitness for use with small mobile devices, and fitness for use by people with disabilities.
The World Wide Web has been a flaming success. It greatly increased the accessibility of Internet resources to everyday people and exploded the demand for Internet connectivity as a result. Now the cell phone industry is trying to get into the act. Offers of "wireless Web" abound in the advertising media. Can this success be transported to a dialog space based on the limited interaction capabilities of a wearable mobile device? It is not clear.
Interestingly enough, there is no one, world-wide answer to the question "Can I make my Web application play on my WAP-enabled phone?" Currently in the U.S. the answer is 'no,' while in Japan it is 'yes.' The "wireless Web" in the U.S. resembles our first figure above. U.S. wireless Web service is primarily based on the Wireless Application Protocol (WAP). This does not use the Web-standard HTML language for what is transmitted to the phone. So, connecting with this service only gains access to a limited coverage of Web sites where the sites have taken the trouble to edit their content or services into WAP-compatible WML (Wireless Markup Language) form. There are a few WAP-to-Web gateways available via the WAP network, but using them will probably convince you that the WAP architects were right not to assume that Web pages would serve a consumer-desired service when presented in this limited interaction space.
On the other hand, in Japan, the DoCoMo phone processes HTML and accesses services that cover Japanese Web sites. This service is quite popular. Part of this may be due to differences in the capability of the cell phone platform that the wireless web-services run over, and part of it may be due to differences in the prevailing style of web design in the two countries. But it is interesting that based on this evidence, the answer to our question is a definite 'maybe.'
A wild card in the above discussion is the emergence of voice portals, or telephone-delivered services such as TellMe and BeVocal. These services use voice in both directions, as opposed to the existing Interactive Voice Response systems where the user input is via the telephone keypad. They serve information similar to what works on a WAP phone, but they don't required a special phone. All it requires is your willingness to talk with a robot. This is a class of service that lets you keep your eyes on the road and still learn where the next Wendy's is up the road. The voice portal is a class of service that works for blind people "out of the box" with no adaptation. However, this is another nascent industry (like the wireless Web) where the existing data normalizations are not fully integrated with the W3C family of Recommendations.
Part of the magic of the Web was that it created an immense pool of information and service consumers. Tim Berners-Lee's strategy for universality worked, in large measure. This meant that any Web site had access to gazillions of people all around the world. The fact that the mobile Web was turning out to be an array of spaghetti-sized stovepipes, or a small row of pigeonholes, was not the effect that the developers wished. So there has been motion in the WAP Forum and the voice portal industry to try to recapture the "anyone, anywhere, connected to anything" capability that the Web seemed to promise.
This resulted in some workshops, one in Hong Kong in September, 2000 and another in Bristol, UK, in October, 2000. The first was titled "Multimodal Web" and the second "Device Independent Authoring." The first was mostly focused on what could be done to move toward convergence of voice and cell-phone-data service technology foundations. The second was more ambitious in the range of interaction spaces it considered targeting with common-source development of content. The workshops brought together people from varied backgrounds, so it is not surprising that a broad range of views were presented. These included some that amount to "It's just hard." Not too surprisingly, a countervailing "Sure, we can do it" note was struck by a leading database vendor. Developing content that gets served in a range of (user and device conditioned) views does sound a lot like database design, as suggested in our second figure, just above.
Precedents from Access Technology
At these workshops, especially in Bristol, the industrial participants came up against an interesting segment of the W3C community, the Web Accessibility Initiative (WAI). This is a domain within the World Wide Web Consortium (W3C) whose mission is to try to make sure that the "everyone" who are connected by the Web includes people with disabilities.
Accommodating the needs of people with disabilities has been the driver for many technical innovations down through the years, such as the typewriter and the long playing record. If one looks for it, one will find that disability access to Web information has been the motivation for some interesting work in active filters and gateways applied to Web content. One of these is the ",tablin" option on the W3C Web site that does some page transformations so tabular layouts will not confuse older screen readers that read only the screen, and are unaware of table cell boundaries. The BETSIE gateway developed to enhance disabled access to the BBC Web site and Silas Brown's Gateway are two other examples worth visiting to get an idea of what can, indeed be done.
An issue that confronts us in attempting to unify Web and phone services is that the collection of services crosses media lines. The same information will be presented in a phone service in audio recording and on a Web site in text, for example. For this reason, it pays to snoop around for multimedia precedents. Here the engagement of the deaf interest in obtaining captions for TV and multimedia content is important. This results in multimedia integration proposals like the W3C Synchronous Multimedia Integration Language (SMIL) and the desired effect can be experienced on the Web by visiting pioneering service providers such as the Able Channel of TV on the Web.
Another point of pressure to take a robust approach to diverse media in Web accessibility comes from the members and advocates of the population with reading-related disabilities. For this audience, words can be a barrier to understanding, and facing too many of them can be a turn-off. The graphic cueing of what one is talking about and how the site works becomes paramount for these users. To get an idea of what is involved here, visit the pioneering Web site of peepo.com.
Most of these markets have major overlaps between them. There ought to be a more integrated way to serve them. The experience gained generating access solutions suggests a few ideas that should help, but does not provide a complete solution ready for prime time.
The figure suggests that the diversity of demands from different devices, different people, and people in different situations all demands robustness from the web content or service designer. Upon reviewing the state of development of "mobile access to the Web" technology and markets with a background in gaining access for people with disabilities, a few ideas have come to us as to how this could work. We share the following for your consideration. What else can you think of?
For example, it appears that blind access and mobile access share an interest in better articulated information structure at a finer grain than what is captured in Web design or at least the HTML encoding of the Web content today. How so?
The dominant difference between the human:computer interface that a blind computer user uses and the GUI (Graphical User Interface) interaction space that the application designers assume has to do with the large, persistent buffer of human:computer shared information that the screen provides in GUI interaction that is largely absent in accessing computation via audio or Braille. Audio display is fleeting. None of it is static. In a windowing GUI the windows sit there on the screen and remind you of the set of things you were doing. Accessing the same multi-activity session by audio, one must cycle around through the open windows to gather that information. Although Braille is quite different and does have some persistent display, the scope of that display is limited to a single line of perhaps as many as 40 characters.
An example of this effect is seen in access to tabulated data, such as transit schedules. "True" table such as this (much HTML 'table' markup is simply used as a layout grid) have lots of similar data cells in the interior and headers ranged around the edges that explain what the data cells are telling you. When accessed in speech and Braille, the headers are "off screen" when the cell contents are read. This can lead to a breakdown in comprehensibility. High-end access technologies for blind users now incorporate functions to ensure that headers are used to help the user understand the contents of table cells. Through the efforts of the WAI, some additional markup capabilities were added to the HTML table model to ensure that the necessary cell-to-header relationships could be made explicit and unambiguous in HTML-encoded tables. Markup such as the links set up using the HTML 'headers' attribute reaches across the page to create a small pattern or logical neighborhood for each data cell. This if a data cell needs to be presented in a small display, these connections in the markup make explicit what information from outside the clip region is essential and needs to be brought along.
The WML format that is used in sending content to WAP phones uses a 'card' unit of content encapsulation that is displayable in the space of one or two wireless-device screenfuls. This, likewise, is a small unit of content. Rational organization of the cards requires understanding the relationships among the small chunks of information in an overt and machine-interpretable way to a finer grain than what it takes to put the same information up on the Web. Reviewing typical Web page designs, one can manually (by dint of human understanding of the intended effect) recognize and articulate that there is a rhetorical structure to the design that can be broken down in terms of more or less card-scale units of content. Inspecting the HTML coding of the page, however, reveals that this rhetorical or content structure is not well captured or exposed there. If the pages posted to the Web met the disability needs for overt internal organization, they would be in much better shape for automatic filtering into WML cards, and vice versa.
Similarly, to understand how to create a common information base that spans different media, it helps to review how SMIL (Synchronized Multimedia Integration Language) can be used to integrate a multi-media experience which is flexible enough to meet the needs of diverse users, including people with disabilities. The figure shows the three axes along which content is organized in the logical model of a SMIL presentation. There is sequence (SEQ) for time-ordering content elements (marked 'media' in the figure) into what comes first and what comes next. There is also a parallel (PAR) construct that defines streams that should be presented concurrently. Finally, there is a SWITCH construct that allows for the inclusion of ranges of alternatives. Selection among these choices is performed depending on the user's situation and equipment in the client-side processing by the SMIL player. The implementation of these three capabilities in SMIL is very mix-and-match. This allows for flexible construction of composites containing concurrency and choice from whatever sort of resource building blocks you have. It allows for merging streams received from different network locations in the client process, as well. This is relevant for the construction of command posts or consoles for net-distributed computations as on the Grid.
Based on the WAI experience with composing accessible content in HTML and SMIL, one can be cautiously optimistic for the economic viability of single-source content creation for a range of media. At least for the common capture of content for voice and wireless-data services. For example, a route-planning function that helps you determine how to use the local public transit to get where you need to go from where you are.
Perhaps for even broader bands of service as measured by communication data rate and interface size (in persistent bits of human:device shared content). In any case, the cadre of shared resources will be a pragmatic mix of more-encoded forms which can be transformed automatically into sight, sound, and touch; and bundles of less-encoded alternatives which cover the range of senses or media outlets by selection. An example of the first class of resource is text in HTML paired with a range of stylings in CSS. An example of the second category (bundle of low-level data alternatives) would be the pairing of a caption stream with an audio stream within a multimedia presentation. For example, the credit to a sponsor might be just a logo or single word on the WAP screen, and a full, multi-word name in the voice dialog. In the Web page presentation, all three of these forms would be used: the logo as an image, the full name as its 'title' attribute and the single-word shortening of the full name would be used as the 'alt' attribute for speed reading.
Effective commercial or scientific exploitation of multi-outlet content sourcing will take forethought. It is not enough to design Web pages and then try to post-process them into content fit for alternate media. One will need to think ahead what kinds of content or services are flowing through your process, and what kinds of interaction spaces they needs to be available in. Apply some sort of model normalization discipline to this supply vs. demand picture and you will have the basis for a cost-effective investment in resource reuse and the capability to connect with your customers whatever communication channel they prefer.
It's Happening; Get Involved!
Where to go with this? Well, the story is not over. This area of technology is evolving rapidly. Tomorrow's third generation cell phones will have greater bandwidth for data and at some point in the not-too-distant future mobile phones will support simultaneous exchange of both voice and data. At that point the medium becomes distinctly more attractive to consumers and as a platform for interactive services. To have the kind of broad perspective needed to successfully design a broad-gage content-sourcing or service-packaging strategy, you need to be involved in the communities where dialog is going on across broad swaths of the industry. Come join Trace personnel in pursuit of effective universal designs for infrastructure in community consultations like the following three.
The Grid Forum - The Grid Computing Environments Working Group in the Grid Forum is where people will be comparing what works and what doesn't in portal access to metacomputing. This group will be introducing an element of user-centric reality into the evolution of the Grid architecture. If you cannot participate, track what is happening. There are likely to be key insights you need to know about that you learn first here.
W3C/WAI - The Web Accessibility Initiative of the World Wide Web Consortium is an excellent chance to gain insight into human:computer interface realities from a broader base in human experience. And you know that the more diverse the vantage points from which you can observe something, the more depth you will gain in your perception of what is going on. In particular the Evaluation and Repair activity in the WAI is a collection point for work in content transformation, and the Protocols and Formats working group is positioned to have up-front influence over the shape of W3C specifications as they are under development. Participate here and you will gain in the form of a more capable and robust base of Web technology.
NCITS/V2 - The National Committee for Information Technology Standards (NCITS) has just launched a Technical Committee V2 - Access Interfaces which is tackling the problem of how to do business with the myriad intelligent devices of the emerging pervasive computing domain from personal and adaptive devices. This is both a friendly amendment with regard to the capabilities of the "universal" remote control, and an opening through which the personal digital assistants of people with disabilities can play in the "everyone, everywhere, connected to everything" game.