W3C home > Mailing lists > Public > wai-xtech@w3.org > July 2002

RE: generating voice and palm-top dialogs from a common model?

From: Al Gilman <asgilman@iamdigex.net>
Date: Thu, 11 Jul 2002 10:36:42 -0400
Message-Id: <>
To: wai-xtech@w3.org

For planning purposes, here is a sketch of an authoring process and device-independent
common-source resource web and adaptive tailored service framework that would seem
to have enough in it to work for people with disabilities.

There appears to be a candidate win plan for this problem.

1.  The first form of the application that one designs is the Voice view.  See
for roughly why.  It is mostly that the voice domain of interaction is something
that the author understands which requires the simple step by step steps to be 
articulated.  The author can understand "don't put too much on a page" more readily
and sympathetically when they think in terms of a voice dialog.

2.  The "foo should be in all contexts" rules (Exit, home, help, ...) from 
are made a
design constraint on the Voice dialog design, or at least on the interaction logic 
reference model.  The voice binding can leave some of this out so long as it is defined
in the interaction-logic references and can be pulled from there as needed.
The higher level information architecture (structure and navigation)
of the design still inherits design rules from the talking book model.  Katie has been
experimenting with this.  If we have actual conflicts between what the digital book 
and severe learning difficulties models seem to require, we go back to design and
experimentation to iterate on the model.  But I claim that they are more alike than
conflicting.  [reconcile with research and standardization efforts on universal speech
interface as well.]  

Summary: the definition of the abstract interaction logic of the application applies 
patterns from a reference model that has been cooked to be sufficient to ensure usability 
in a variety of different disability-driven [derived view] delivery contexts.

With the macrostructure precedent of the digital talking book, the micro flow precedent
of the severe learning difficulties web guidelines, and the media alternative precendents
of images on the web and sound/action in multimedia we have the bulk of the rules now available.

3.  Represent the interaction logic in the "controlled-appliance specification" model
of the Personal Universal Controller project under Pittsburgh Pebbles at CMU.

4.  Synthesize the chunking -- structure and flow -- of the frame sequence in a
palm or desktop visual derivative through an industrial-strength extension of the
algorithms that the PUC has developed for the Palmtop device class.  But the logical
complexity of the individual steps would be constrained by user requirements and not
just device constraints.

5.  Elaborate the rich media correlates of the interaction logic to fill the space
available.  Again where the computation of "space available" accounts for user
preferences in addition to device capabilities.

[advanced level:  the combination of user preference for rich media plus the 
content-instance availability of rich media can actually drive the logical complexity
per frame below that set by the constraints, in order to allow the "decorate to 
fill the space available" phase to have room for as much rich elaboration as the
user prefers.  The root method is global optimization under constraints, not just 
constraint satisfaction and local optimization.]

The point here is that the severe learning difficulty consumer, using a desktop or
TV sized display, would use the same chunking as the palmtop or cell phone consumer,
with few words and lots of graphics consuming the pixels of the screen.  But the
few words that were there would be the same words (or synonyms recognizable by thesaurus)
as compared with those one would use in a small-display device context such as voice 
or a WAP1 phone.

[I am only developing the structure-and-flow aspect here.  There is also a rich 
verisimilitude -- ususally called rich media -- aspect and a precise explanation
aspect -- addressed in Lisa Seeman's draft schema at


but I can't write the whole book at once...]

The state of the art would seem to be that adaptive synthesis of the structure and
flow of the dialog is beyond the capabilties of CSS but not beyond hope to get from
the "single source authoring" tool suites developed by the Device Independence
constituency.  Particularly with the positive results coming out of CMU.

The structure and flow would most likely be a server-side transform.

The rich decoration of eye candy on the bones of a cell-phone-capable structure and
flow could be a client-side matter.  Or it could be done on the server side too.
Certainly in the immersive (above desktop) client class it would be assembled on the
client side or at a supernode portal very close in the network to the client node.

The overall key fact is that we have separated some concerns: the step size of the dialog
(complexity of the individual steps -- screens in the SLD whitepaper) from the
degree of rich-media elaboration.  The trendy language for how one separates the concerns
is "aspect oriented programming."  The point is that a given body of data or code does
not address a single concern, but combines patterns inherited from multiple 
'aspect'-distinguished ancestors.  The standard web design would actually use less
of the rich media relatives of the interaction logic because it was packing 
more interaction complexity on one screen and didn't have room for the full elaboration 
of the audiovisual re-enforcement.  But "TV show mode" (SMIL multimedia presentation 
under "my preference is couch potato" conditions) would draw on all that audiovisual 
stuff and might only use the textual core in [normally hidden] subtitles and captions.

Note that in we-try-harder mode an assistive user agent could use the "precise
explanation" features to reach out onto the web to get more multimedia that relate to 
the interaction logic as defined by the author.  Not all the rich media content would
have to have been collected and offered by the originator of the core interaction logic.
Some of that could come from general public knowledge or assistive-dedicated picture and
sound thesaurus literature.  Or the explanatory elaboration layers could come from the
education market and not require development on a dedicated 'disability' basis.  People
with learning disabilities are less hung up over age stigma than are the children who
are passing through acquisition of learning skills and knowledge very rapidly and are
very sensitive to 'baby stuff' appropriate for stages that they feel they have passed.

Received on Thursday, 11 July 2002 10:37:53 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:51:28 UTC