- From: Nils Klarlund <klarlund@research.att.com>
- Date: 09 Nov 2001 14:39:59 -0500
- To: www-voice@w3.org
Samuel L. Bayer and his colleagues of the MITRE Corporation have offered some honest and well-argued technical remarks. But I believe that their conclusion---that VoiceXML must be thoroughly reworked and relieved of patent burdens before being endorsed---might do more harm than good. As someone who was briefly close to the genesis of VoiceXML I recall that many of the issues raised by the MITRE people were extensively debated. In fact, the VoiceXML specification emerged through a synthesis of notations that each emphasized aspects MITRE finds lacking: PML emphasized the configuration of voice menus, VoxML offered a declarative approach to the control flow and help/error situation management, and SpeechML featured a very clean use of tagging for describing the essential contents of dialogues. I believe that the VoiceXML incorporates these qualities to some degree. But VoiceXML obviously is a compromise that attempts to merge the declarative aspect of dialogues with the need for computational meaning. A markup language is not very useful as a standard if it does not have a meaning that can be understood by computers. And when a markup language acquires meaning in this sense, it becomes a programming language. This is the case even for HTML: it is a programming language for a specific purpose, namely interactive display, and its meaning is given through the CSS specifications as published by the W3C. In the beginning, HTML did not have much meaning except for that expressed in widely different implementations---a legacy that still plagues browsing technology. This misstep is avoided in the VoiceXML approach. In general, the distinction between programming and markup languages is not clear except perhaps to say that a markup language is a systematic way of providing parameters to programs. In the case of spoken dialogue systems, the design space for characterizing parameters is enormous, and I think that VoiceXML is a helpful beginning. Although the MITRE group may be correct about the failure of VoiceXML as a programming language, it is nevertheless a comprehensive attempt at embodying the characteristics of dialogue systems in one relatively abstract framework. ("Relatively" means with respect to other computer-telephony standards.) I do not believe that the flaws of VoiceXML significantly impede the industry. Quite the contrary, VoiceXML could be an important vehicle for furthering the art: there is nothing that would prevent e.g. MITRE researchers from studying markup languages to complement VoiceXML. And they could use techniques of representation and program transformation that are decades old, and now becoming popular through XML, to work at more declarative levels. For with VoiceXML, there would at least be one, commonly available target notation in terms of which markup languages could be defined. Currently, it is disheartening to see how even pioneering speech recognition companies are struggling to make money. Delays in at least adopting one reasonably abstract standard might further delay wider deployment of speech technology. In the big picture, even the patent dispute seems less important to me: if royalties were involved, I would be surprised if the monetary amounts would impede anybody from making progress in the area. (I am guessing here, I admit.) Really, obstacles concerning infrastructure, the lack of development tools, and consumer acceptance are probably much more important. I believe: in due time, elegant (and even royalty-free) solutions that convincingly distinguish and connect markup and programming-language layers will emerge. If companies and researchers have worked at a sufficiently abstract level, then they may even be able to replace the VoiceXML parts of their solutions by simply rewriting their program transformations. The widespread adoption of VoiceXML will make the need obvious to everybody in this area that the technology must be improved in many ways, and perhaps in many more ways than we envisage currently. /Nils Klarlund Usual disclaimers apply to all of the above.
Received on Friday, 9 November 2001 14:40:25 UTC