- From: Al Gilman <asgilman@iamdigex.net>
- Date: Fri, 07 Sep 2001 23:21:05 -0400
- To: <andrew.hunt@speechworks.com>, <www-voice@w3.org>
At 07:30 PM 2001-09-03 , Andrew Hunt wrote: >Al, > >The WG has visited pronunciation and lexicon issues several times >in the context of preparing the grammar spec. The following is my >best effort at a summary of the overall group position: > >1. Version 1.0 of the Grammar Spec will not permit within-grammar > specification of pronunciations AG:: Well first off, this is very good to hear. I had it the other way around. The way I read the grammar specification, it _only_ included "sounds like" tokens as terminals, without regard for dictionary spellings. If your concept of grammar definition is that the token terminals are words spelled as they would be found in the dictionary, and how to expect it to sound is clarified in the optional referenced lexicon, then that is much better. >2. The most recent draft (August 20) adds a <lexicon> element that > permits reference to an external pronunciation lexicon >3. Review of RUBY found that it was not suited to the needs of the > grammar pronunciation problem. I cannot comment on the details > as other WG members did the review >4. Pronunciation lookup should be outside the scope of the Semantic > Interpretation spec. Put differently, the grammar spec should be > complete and stand-alone on the topic of pronunciations AG:: As it may be clear by now, our concern is with whether the definition of the input to which the system responds is captured in a form [orthodox spelling] usable with visual or tactile media (such as Braille). I believe that standard spelling is "semantic enough" for the desired 'repurposability" in this respect. >5. The specification, as currently drafted, is sufficient to produce > interoperable implementations for a majority of use cases (but > certainly not all) > >FYI, the Voice Browser WG has begun work on a Pronunciation Lexicon >specification with a Requirements document being released earlier >this year. I do not anticipate a first working draft of that before >the end of this year. > >> It is not clear that WAI can approve this document going to PR until that >> mechanism is defined and it is clear it works. > >Could you please expand upon this comment so that we can have a >sense for the benchmark that the spec will be reviewed against. Let me offer a bit of disclaimer first. The WAI is a very shirtsleeve operation, at least as you are encountering it here. We have some formal Recommendations out and some in the pipe, but I would have to say there is nothing of record that would qualify as "The benchmark against which the specification will be measured." So rather let me try to frame the following as "the performance axis along which the specification will be measured" rather than "the threshold of acceptability it must meet." I hope this makes some sense. The rough idea is that the model from which the voice-accepting dialog is generated should contain sufficient information so that a dialog using other-than-speech input [screen forms, command line text, ...] could be derived [by a program, not a programmer -- but it could be a large and compute-intensive program] as an alternate view or access mode of the service for individuals with no vocal capability or a voice that won't recognize. If the standard way of tooling up a voice dialog in W3C technology does not create a repurposable resource base under it, then this could be considered a serious problem. Voice recognition is regarded as a user interface device, in the sense of striving for access to enough repurposability or alternate media morphs to avoid single-point failure modes induced by dependency on a specific device. Some people can't use a mouse, some can't use a visual display, some can't use a voice recognizer. Can we make the W3C technologies so that they don't shut any of these people out? That's what we are trying to do. For Deaf customers, their speech may or not be recognizable but the voice prompts are not usable. Is the logical content there to serve the same service over text telephone with an auto-switch at the call control level to an alternate port serving the text-telephone dialog flavor of the service? So what we are shooting for is that any device-specific realization of a service-delivery channel be backed with more robust resources capable of being the base for delivering equivalent service in other interaction modes. This is a desideratum, untried in the court of a 'to the mat' struggle at the Director's desk. Partly we have some inclination to listen to reasonable arguments as to what is readily achievable. And partly we are all discovering how to do it as we do it. One is always in the awkward position of hindsight. One of the things we find ourselves doing is trying to accelerate the adoption of a device-independent perspective on all sides, even as the more device-specific technologies are as we speak just being cooked. And it's not clear what the device-independence architecture is until there are voice browsing and XForms separate proposals on the table to review. But as I noted above, I had read the grammar to say the terminal tokens were by definition phonetic and of uncertain relationship to anything with a recognizable relationship to dictionary meaning or Braille transcription. If they are standard text and the linkage to phonetic hints to the recognizer is more distant or dubious, the the issue is not the sort of magnitude that I thought when I wrote before. Al > >Regards, > Andrew Hunt > >> -----Original Message----- >> From: www-voice-request@w3.org [<mailto:www-voice-request@w3.org%5DOn>mailto:www-voice-request@w3.org]On >> Behalf Of Al Gilman >> Sent: Wednesday, August 29, 2001 4:47 PM >> To: www-voice@w3.org >> Subject: how are orthographic equivalents for tokens documented? >> >> >> >> For speech recognition, one may wish to use as a token a phonetic >> equivalent of a word. This may be done by phonetic spelling in the ususal >> writing system of the current language, or by external-production-reference >> to a speech synthesis grammar production for a more precise phonetic form, >> if I understand the architecture. >> >> In any case, it is important to unambiguously capture the relationship to >> the standard spelling of a word, if the token represents a word. "The >> standard spelling" is meant to indicate "as you should look it up in a >> dictionary." >> >> There is an XML precedent in RUBY that could be followed for providing this >> capability. >> >> Is this planned to be covered in the 'semantic processing' volume? This is >> an important function. Please clarify. >> >> Al >> >> ["personal opinion" disclaimer] >> >> >> >> It is not clear that WAI can approve this document going to PR until that >> mechanism is defined and it is clear it works. >> >
Received on Friday, 7 September 2001 22:58:17 UTC