Re: Comments on VoiceXML 2.0 from Nils Klarlund on 2001-11-09 (www-voice@w3.org from October to December 2001)

From: Nils Klarlund <klarlund@research.att.com>
Date: 09 Nov 2001 14:39:59 -0500
To: www-voice@w3.org
Message-ID: <u668jka28.fsf@research.att.com>
Samuel L. Bayer and his colleagues of the MITRE Corporation have
offered some honest and well-argued technical remarks.  But I believe
that their conclusion---that VoiceXML must be thoroughly reworked and
relieved of patent burdens before being endorsed---might do more harm
than good.

As someone who was briefly close to the genesis of VoiceXML I recall
that many of the issues raised by the MITRE people were extensively
debated.  In fact, the VoiceXML specification emerged through a
synthesis of notations that each emphasized aspects MITRE finds
lacking: PML emphasized the configuration of voice menus, VoxML
offered a declarative approach to the control flow and help/error
situation management, and SpeechML featured a very clean use of
tagging for describing the essential contents of dialogues.

I believe that the VoiceXML incorporates these qualities to some
degree.  But VoiceXML obviously is a compromise that attempts to merge
the declarative aspect of dialogues with the need for computational
meaning.  A markup language is not very useful as a standard if it
does not have a meaning that can be understood by computers.  And when
a markup language acquires meaning in this sense, it becomes a
programming language.

This is the case even for HTML: it is a programming language for a
specific purpose, namely interactive display, and its meaning is given
through the CSS specifications as published by the W3C.  In the
beginning, HTML did not have much meaning except for that expressed in
widely different implementations---a legacy that still plagues
browsing technology.  This misstep is avoided in the VoiceXML
approach.

In general, the distinction between programming and markup languages
is not clear except perhaps to say that a markup language is a
systematic way of providing parameters to programs.  In the case of
spoken dialogue systems, the design space for characterizing
parameters is enormous, and I think that VoiceXML is a helpful
beginning.

Although the MITRE group may be correct about the failure of VoiceXML
as a programming language, it is nevertheless a comprehensive attempt
at embodying the characteristics of dialogue systems in one relatively
abstract framework.  ("Relatively" means with respect to other
computer-telephony standards.)

I do not believe that the flaws of VoiceXML significantly impede the
industry.  Quite the contrary, VoiceXML could be an important vehicle
for furthering the art: there is nothing that would prevent e.g. MITRE
researchers from studying markup languages to complement VoiceXML.
And they could use techniques of representation and program
transformation that are decades old, and now becoming popular through
XML, to work at more declarative levels.  For with VoiceXML, there
would at least be one, commonly available target notation in terms of
which markup languages could be defined.

Currently, it is disheartening to see how even pioneering speech
recognition companies are struggling to make money.  Delays in at
least adopting one reasonably abstract standard might further delay
wider deployment of speech technology.

In the big picture, even the patent dispute seems less important to
me: if royalties were involved, I would be surprised if the monetary
amounts would impede anybody from making progress in the area.  (I am
guessing here, I admit.)

Really, obstacles concerning infrastructure, the lack of development
tools, and consumer acceptance are probably much more important.

I believe: in due time, elegant (and even royalty-free) solutions that
convincingly distinguish and connect markup and programming-language
layers will emerge.  If companies and researchers have worked at a
sufficiently abstract level, then they may even be able to replace the
VoiceXML parts of their solutions by simply rewriting their program
transformations.

The widespread adoption of VoiceXML will make the need obvious to
everybody in this area that the technology must be improved in many
ways, and perhaps in many more ways than we envisage currently.

/Nils Klarlund


Usual disclaimers apply to all of the above.
Received on Friday, 9 November 2001 14:40:25 UTC