- From: Al Gilman <asgilman@iamdigex.net>
- Date: Sun, 31 Mar 2002 11:01:53 -0500
- To: www-archive@w3.org
- Cc: <pjenkins@us.ibm.com>
[Disclosure and disclaimer. This is from a discussion on data formats for defining multimedia interactive experiences on the WAI-PF list. There may be inessential references to Member-confidential information in the prior converstation, but it is believed that this post is free of taint of confidentiality. Please address To: <asgilman@iamdigex.net> Cc: <pjenkins@us.ibm.com> if you wish to clarify that point. Al Quotable in its entirety with attribution (use www-archive URL), in public. This message is claimed by its author to be free from confidential information. However, nothing in this message shall be construed to represent the position of the W3C, the WAI, or the PF working group. The opinions expressed are those of the author and all errors and injustices are his doing.] There are actually two questions raised by Phill's good post. a) what are the standards that should be applied in reviewing VoiceXML (or the Voice Browsing Profile with VoiceXML in place within it). b) compare and contrast our expectations for VoiceXML and SALT. My response is roughly a) The answer cannot be answered definitively at the outset "what scenarios is Voice Browsing technology required to support?" We need to separatedly evaluate multiple questions: - What [related] scenarios are beneficial for people with disabilities? - What service-equivalence-classes are required in similar pre-existing scenarios? - What beneficial scenarios could this technology possibly support with tweaking? - How readily achievable is the required extension or modification to accomodate that scenario? We need to make progress on all these sub-questions concurrently; we can't expect a definitive answer to any before working on the rest. b) We have to look at the nominal usage scenario of each and identify reasonably nearby usage scenarios that matter for people with disabilities. We don't have the unified theory of universal interfaces or content to enable deriving the true requirements for each of these from a common root. VoiceXML presumes an audio only service-delivery context; SALT presumes that audio plays a minority or supporting role in conjunction with a lot of other, including large screen visual, display and probably large keyboard input. Details follow. At 03:18 PM 2001-11-02 , Phill Jenkins wrote: > >The question I ask the p f working group is: "What is our objective in >reviewing VoiceXML 2.0? >- is it to review the spec and insure a user agent could be developed so a >deaf or hard of hearing person could interact with application? >- is it to review the spec to insure that a user agent (assistive >technology) could be developed for a person with a mobility impairment >(limited hand use) so that he would be able to interact with the >application? >- is it to make VoiceXML more multimodal so various input and output >techniques can be used in a desktop environment? >- or what? > >Before I begin the review, I think the VoiceXML group should agree with (or >at least know) our objective. > AG: Very good questions. Charles pointed out the "at least there should be a non-final-form option" clause that we agreed on with XSL FO. I don't disagree with this as an "at least." But I don't think that our review should in principle be limited to this. Let me first try some line by line responses to the above questions. > >The question I ask the p f working group is: "What is our objective in >reviewing VoiceXML 2.0? >- is it to review the spec and insure a user agent could be developed so a >deaf or hard of hearing person could interact with application? AG:: Not just that a user agent could be developed. The policy that we should point out is prevalent, if not universal, is "if you offer a service over the Public Switched Telephone Network (PSTN), then you SHOULD provide an equivalent service that is usable by text telephone (TTY) [wherever readily achievable?]. In some domains, that is to say in some political jurisdictions and with regard to some services I believe this SHOULD is raised to a legal MUST. Just what the standards should be in this area is a matter of current debate. See the IVR Forum for example for an industry view, I believe. http://www.atis.org/atis/ivr/documents.htm One of the facts of life is that Voice Browsing technology competes with technology which just addresses the IVR market. We cannot assume that society at large will legally or otherwise require policies for Voice Browsing applications that are used functionally like IVR systems to comply with access policies that are radically different from the policies that IVR systems are required to meet. We don't want our demands on Voice Browsing technology to put a ball and chain on them that makes them lose in the marketplace. On the other hand, we don't want them to fail to be competitive at the least in cross-modal compliance, either. My scenario for TTY compliance is that the telephone gateway detects TTY on the main number or there is a TTY listed number parallel to the main number for the Voice Browsing server to PSTN. Hopefully migrating to autodetection. For entry of phone numbers and similar codes, key entry is ususally more usable than voice recognition, so the Voice Browsing grammar contains provisions, but not requirements, for a DTMF mode. In either case, the Voice Browsing system has call control capabilities where it can auto-forward the call to a server, which may be hosted by a specialty TTY option house, which implements the TTY compatible parallel version of the Voice Browsing service. What we should review suggesting as the policy ancestor or prior requirement is that the Voice Browsing profile that is to say bundle of specifications at minimum takes all readily available steps to make this parallel service delivery chain trivial to implement or optionally b) builds the transform to the parallel into their test suite or even into the specification. How much should be built in is reasonably a matter of discussion between them and us. Maybe we should not take a policy position, maybe we should. But at the least we should clearly articulate that evidence that we feel they should accept as objective: the layout of the failure mode or success mode scenarios and why we believe from parallel examples that this scenario will succeed or fail. [ASIDE: The interface to the policy market -- the rules under which we allocate MUST, SHOULD, MAY choices to clauses -- is a matter of irresolution across the WAI. At least I claim there is no clear and agree metapolicy.] In any case, it should be part of our job to check that there is nothing stupid in the specification that keeps it from being readily achievable to provide a TTY compatible service parallel to any voice-in, voice-out service communicated over the PSTN. The media and the dialog management are damn near identical. You just have to have text for all audio tapes that are part of the dialog and understand a few TTY rules of procedure, I suspect. The rest is just to follow the emerging standard for tiers of service. The service that talks to a TTY or a voice grade circuit connection is a front tier and there is a backoffice WebService which represents the business rules and data of the transactions. The TTY serving house and the voice serving house both talk with the backoffice server over the same SOAP borne WebService. >- is it to review the spec to insure that a user agent (assistive >technology) could be developed for a person with a mobility impairment >(limited hand use) so that he would be able to interact with the >application? AG:: Here my gut reaction is a 'readily achievable' test. This is compatibility with AT. Yes, absolutely voice technology is very important in building AT for people who have motor impairments and usable voice. On the other hand, we need to look at the scenarios for what sort of technology one would actually use, here. This sounds more like a candidate for the SALT technology than the VoiceXML technology. In fact, Desktop access to the backoffice interface of the Voice Browsing service is of very real interest for people with visual, not so much motor, disabilities. Not that the access should not be available for all if it is available for one. There are some exceptions to that but our first assumptions is that all modes are available to all. But the person with motor impairment wants to take maximal advantage of everything they have left. A well presented visual/voice interface will be much more efficient that a pure audio dialog, and this is a high priority performance factor for them. In the "Escaped Web" piece (Google for that phrase) I argue that single-source authoring will first be achieved across modalities which are more similar than across modalities that are more diverse. Not that our ultimate goal is not to have all served information registered into a model which is both scalable and laterally mobile across display, command, and cognitive modalities. [Aside: Can't give you an exhaustive list of cognitive modalities. But we can treat as what should be a given that "both show and tell" describes a diversity of cognitive modality that is well known to be frequently beneficial in upping the universality of the usability of an interface or information transaction such as a document.] >- is it to make VoiceXML more multimodal so various input and output >techniques can be used in a desktop environment? There is a valid question on the floor, which is "to what extent should W3C be minimizing its overall constraint set so that it is Device Independent by construction?" Certainly, the VoxML declarative dialog and recovery control features that got left on the cutting floor are of interest in making the abstract resource base more multimode and device independent. But we can't just make non-negotiable claims, here. Dave and Raman and Nils have a point that should be considered, but weighed. The W3C manner of working of filling in bricks before shaping the arch doesn't always lead to good results when the Cathedral is done. The Cathedral vs. Bazaar is not a closed issue. The more Cathedral one thinks, the easier it is to deliver accessible-by-construction technologies. Roger Gimson [private communication] pointed out that "Enterprise thinkers will naturally gravitate to single-source methods; UI prototypers may not warm up to the idea so fast." The point is I have been through all this in CAD. You have to give people deconstruction tools, because very often what will first be achieved is one prototype that you know works, is not engineered for robustness or extensibility, and only scratches the surface of the potential market if you can get out of the box and view the problem and solution in the right light. >- or what? > >Before I begin the review, I think the VoiceXML group should agree with (or >at least know) our objective. > AG:: Here, unfortunately, there is no way to get our criteria communicated to them at 'the right time,' which is when they were writing their requirements document (which is usually at too low a level anyway). We are continually purifying our understanding of policy principles by deconstructing our gut reactions to concrete scenarios, just as they are creating a technological abstraction by fitting to a range of scenarios. At a minimum, we should ourselves try to segregate: - Laws of mathematics and logic, or otherwise unavoidable constraints. - Precedents in extant social policy whether the W3C charters or UN or governmental utterances. - Summaries of human performance that are well known in the HCI and usability engineering field. - Conclusions as to design constraints on technology. The example here is from the sequence: Only the user knows for sure what works for them and what doesn't. -- this can be established by mathematical analysis of the available information to the author or software designer vs. the user. Even the user doesn't usually know for sure until they have tried [some of] the options -- this is HCI consolidated knowledge. It is descriptive of the demographic facts. The above two 'principles' are applied as our argument for a conclusion or _derived design principled_ which is "author proposes, user disposes." Service delivery chains should exhibit this protocol. That is a matter of judgement; the evidence that should be agreed to be objective is the prior two points. There are hree policy tests in common use in disability access: "readily achievable," "undue burden," and "reasonable accomodation." The first is more used with casual contacts with the public and the last with ongoing relationships such as employment with a known individual. What is considered "undue burden" is probably different in these two cases. I have argued in the GL policy debates that "readily achievable" is a prima_facie case for something to be required, but that "undue burden" depends on looking at scenarios to determine the nature of the burden and the price factors that should be applied to that burden in scenario context. Here we have to include in the decision factors the relative ease or difficulty of alternatives from both the user and service offeror sides. The ultimate in accessible-by-construction is not readily achievable until we can give service designers a reference model that is a) scalable both in part/whole scale and in generic/specific generality, and b) demonstrated effective by multiple bindings to diverse concrete user interfaces. We don't have this yet, so under the "reasonableness cuts both ways" clause we have to be negotiable on incrementalism. But we should still be looking for readily achievable further increments that belong in the spec and get us closer to the "everything is device independent and therefore accessible by dint of specification compliance" asymptotic goal. >At first glance, SALT seems more about additional markup for mutimodal >access (not necessarily accessibility), while VoiceXML seems more about >controlling the phone interaction and conversation model. AG:: Yes; The disability scenarios that are most clearly linked to SALT are the reading-assist modes used by those with reading-related disabilities, and the inverse, the provision of visual equivalents for audible stimuli in the operation of computer systems. There are Universal Design tests to be applied to SALT for sure; but we are not necessarily at a level of maturity in putting content and dialog under one modeling umbrella to explain their requirements as derived from a common set of reference rules. We need to look more locally into the disability scenarios that are related to the most common use of these technologies and try at least to make sure that the local variations transform as gracefully as is readily achievable. So the answer cannot be answered definitively at the outset "what scenarios is Voice Browsing technology required to support?" We need to separatedly evaluate multiple questions: - What [related] scenarios are beneficial for people with disabilities? - What service-equivalence-classes are required in similar pre-existing scenarios? - What beneficial scenarios could this technology possibly support with tweaking? - How readily achievable is the required extension or modification to accomodate that scenario? We need to make progress on all these sub-questions concurrently; we can't expect a definitive answer to any before working on the rest. Al > >A quote from the SALT announcement [1]: > > > > > SALT is a lightweight set of XML elements that > enhance existing markup languages with a speech > interface. SALT will thus extend existing markup > languages such as HTML, xHTML and XML. Multimodal > access will enable users to interact with an > application in a variety of ways: They will be able > to input data using speech and/or a keyboard, > keypad, mouse or stylus, and produce data as > synthesized speech, audio, plain text, motion video > and/or graphics. Each of these modes could be used > independently or concurrently. > > > > > >A quote from the VoiceXML 2.0 tutorial [2]: > >VoiceXML isn't HTML. HTML was designed for visual Web pages and lacks the >control over the user-application interaction that is needed for a >speech-based interface. With speech you can only hear one thing at a time >(kind of like looking at a newspaper with a times 10 magnifying glass). >VoiceXML has been carefully designed to give authors full control over the >spoken dialog between the user and the application. The application and >user take it in turns to speak: the application prompts the user, and the >user in turn responds. > >[1] SALT http://xml.coverpages.org/ni2001-10-24-a.html >[2] VoiceXML 2.0 Turtorial http://www.w3.org/Voice/Guide/ > > >The question I ask the p f working group is: "What is our objective in >reviewing VoiceXML 2.0? >- is it to review the spec and insure a user agent could be developed so a >deaf or hard of hearing person could interact with application? >- is it to review the spec to insure that a user agent (assistive >technology) could be developed for a person with a mobility impairment >(limited hand use) so that he would be able to interact with the >application? >- is it to make VoiceXML more multimodal so various input and output >techniques can be used in a desktop environment? >- or what? > >Before I begin the review, I think the VoiceXML group should agree with (or >at least know) our objective. > >Regards, >Phill Jenkins >
Received on Sunday, 31 March 2002 11:01:57 UTC