RE: generating voice and palm-top dialogs from a common model? from Al Gilman on 2002-07-09 (wai-xtech@w3.org from July 2002)

From: Al Gilman <asgilman@iamdigex.net>
Date: Tue, 09 Jul 2002 15:01:30 -0400
To: Roni Rosenfeld <Roni.Rosenfeld@cs.cmu.edu>
Cc: brad.myers@cs.cmu.edu, ncits-v2@nist.gov, wai-xtech@w3.org, "Dahl, Deborah A." <Deborah.Dahl@unisys.com>, "Thomas Harris (E-mail)" <tkharris+@cs.cmu.edu>, Roni Rosenfeld <roni@cs.cmu.edu>
Message-Id: <5.1.0.14.2.20020709143742.02112890@pop.iamdigex.net>
At 02:32 PM 2002-07-09, Roni Rosenfeld wrote:

>Al,
>
>Unfortunately I couldn't make it to ACL this year, but my student Thomas
>Harris, who has done most of the work on the USI/PUC interface, is at ACL
>right now.  By way of this email I am asking him to try to hook up with
>Debbie.

Hello, Tom; welcome to the thread.

>Briefly, one of our goals is indeed to generate speech interfaces
>automatically from Brad Myers' PUC specification instances.  We can do that
>now, albeit in a rather rudimentary way.  
>
>
>> * Is there somewhere a writeup that summarizes the commonality between the
>> technology-utilization profiles employed in the running code for USI and
>> PUC?
>
>[I am not sure I understand the question.  Perhaps Thomas could answer this
>better.]  We have an XML DTD for specifying the USI speech interface
>behaviour, and from which actual working speech interfaces are automatically
>generated.  We have also generated automatically instances of that spec from
>instances of PUC's XML document (note there are two distinct XML DTDs here:
>the PUC's and the USI's).

In terms of the XML Accessibility Guidelines, we don't credit a DTD with
"specifying the USI speech interface behavior" as it is beyond the
expressive capability of the XML DTD language.  We would probably 
have stated this "we have an XML representation for a reference form
which implies the speech interface behavior.  The specification of the
behavior implications is keyed to a DTD for the XML representation."  See
for example (WORK IN PROGRESS):

http://www.w3.org/WAI/PF/XML/

In other words the behavior specification depends on the natural language
stuff keyed to the things declared in the DTD.  A random totally-conforming
processor of XML DTDs would not know boo about speech or reproduce your
behavior without knowing your out-of-DTD-band connotations concering speech
behaviors.

Your thumbnail is already very helpful.  You don't mention how the
inter-DTD transform is coded, but that is a detail.

>> * Are you just re-using the abstract form, the language 
>> specification of the 
>> PUC 'specification' format, or have you been generating Voice 
>> interfaces 
>> _from 'specification' instances_ developed in the PUC context without 
>> editing the 'specification' instance?
>
>As mentioned earlier, the latter is true, but in a rather modest sense.
>This is still an open area for development.  I don't think we are there yet.

Our threshold is how much structure plus how much manual testing is necessary
to assure the automatically generated alternative morph is usable.
The auto-derivative doesn't have to be optimal; it has to be workable.

>For example, there is a strong case for modality-specific information to be
>provided at authoring time.  I'll mention one example here: labels.  Current
>instances of PUC specs allow the developer to specify multiple labels for
>use in visual displays.  One of these labels will be chosen automatically
>depending on display size limitations (e.g. "Volume +", "Vol +").  But for
>speech interfaces, the set of allowable lables may be different (e.g. "Vol"
>is unacceptable, and "volume up" or "louder" may be preferred).  It is
>conceivable that speech labels could be derived automatically from other
>labels, but I suspect this will result in suboptimal speech interfaces.

1. The problem is well known

 http://trace.wisc.edu/handouts/sc2000/middleware_and_eSCaped_web

.. or just Google for "escaped Web"

2. Your experience raises some hope that _if_ the *Voice* binding of a 
"controlled-appliance interaction logic specification" has been person-in-the-loop
tested for adequacy of labeling, that other morphs can be automatically extracted
and will be usable without requiring pre-qualification by user testing.

The optimization of the other morphs is at the discretion and budget of the developer.  They
can add options into the label resources as required, without breaking the capability to
generate something usable.  Just you can't kill off an option required to cover some
gap in the [across-delivery-context] usability of the remaining options as to labeling.

But up to this point we have been stuck with content which is captured into pre-optimized
visual structures which elide some of the voice-required micro-orientation (labeling). 

Lacking the right glue to insert when the persistent display scale shrinks leads 
one to provide inadequate orientation in peephole views.

If we start with a running-code review of the interface design in a delivery context
(voice) which is very demanding on continual re-orientation to context, we may be able 
to prune from there; where we couldn't just interpolate to fill the gaps in the under-
explained visual baseline.

 HCI Fundamentals and PWD Failure Modes
 http://trace.wisc.edu/docs/ud4grid/#_Toc495220368
 
Al

>-Roni
>
>
>> -----Original Message-----
>> From: Al Gilman [mailto:asgilman@iamdigex.net]
>> Sent: Tuesday, July 09, 2002 12:18 PM
>> To: Roni Rosenfeld
>> Cc: brad.myers@cs.cmu.edu; ncits-v2@nist.gov; wai-xtech@w3.org; Dahl,
>> Deborah A.
>> Subject: generating voice and palm-top dialogs from a common model?
>> 
>> 
>> 
>> Roni,
>> 
>> Hope this is not too late to catch you before ACL is over.
>> 
>> At the recent meeting of the INCITS V2 Standards committee, Brad Myers
>> stated that in your Universal Speech Interface work you have been
>> auto-generating speech interfaces from a reference model of 
>> the interaction
>> logic which is expressed in the 'specification' format 
>> developed under the
>> Personal Universal Controller activity related to the 
>> Pittsburgh Pebbles 
>> project.
>> 
>> This is very exciting news.
>> 
>> Under the aegis of the Web Accessibility Initiative, the Protocols and
>> Formats Working Group <http://www.w3.org/WAI/PF> has been 
>> leaning on the
>> XForms Working Group and the Voice Browsing Working Group to 
>> demonstrate
>> that they have, or are working toward by a clear roadmap, a 
>> specification 
>> for a model class that would serve as a single-source 
>> authoring basis for 
>> both voice and more visual forms-mode interactions.
>> 
>> Your work sounds like a new high-watermark in terms of 
>> demonstrating that
>> this can be done, and how.  But I may be interpreting it over 
>> optimistically.
>> 
>> This has radical implications for Web Services and how Device 
>> Independent
>> the W3C Multimodal Interaction work product can be.
>> 
>>  http://lists.w3.org/Archives/Public/w3c-wai-ig/2002AprJun/0057.html
>> 
>> For accessibility purposes, it would be extremely valuable if 
>> "what you
>> need to capture by way of interaction logic" were proven in 
>> multi-binding
>> experiments and captured into a "take home and build" 
>> realization such as
>> the XML syntax from the PUC project.
>> 
>> This could be a major factor (aspect) of the specification of 
>> a "universally
>> accessible" representation for Web Services.
>> 
>> Some of the questions I haven't been able to answer from a 
>> quick scan of your
>> home page are:
>> 
>> * Is there somewhere a writeup that summarizes the 
>> commonality between the
>> technology-utilization profiles employed in the running code 
>> for USI and PUC?
>> 
>> * Are you just re-using the abstract form, the language 
>> specification of the 
>> PUC 'specification' format, or have you been generating Voice 
>> interfaces 
>> _from 'specification' instances_ developed in the PUC context without 
>> editing the 'specification' instance?
>> 
>> I think that those two questions show the general direction 
>> of our interest.
>> Let me stop there for now.
>> 
>> Also, if you can possibly connect with Debbie Dahl while you 
>> are at ACL (presuming
>> that you will be there) please connect with her so she 
>> understands the answers
>> to the above questions, even if a review isn't available 
>> instantly on a public, 
>> WCAG1.0-AAA-accessible-HTML web page.  Debbie is chairing the 
>> Multimedia Interaction
>> Working Group within W3C and has a strong Voice background, 
>> so she will be a 
>> quick study.
>> 
>> Al
>>
Received on Tuesday, 9 July 2002 15:02:00 UTC