W3C home > Mailing lists > Public > www-voice@w3.org > January to March 2003

[SSML Last Call] "say-as" element creates under-modeled markup

From: Al Gilman <asgilman@iamdigex.net>
Date: Thu, 23 Jan 2003 22:20:13 -0500
Message-Id: <5.1.0.14.2.20030123212253.0289aec0@pop.iamdigex.net>
To: www-voice@w3.org


<note>

Sorry this is late.

In view of the timing, this comment has not been discussed in the PF group.
It is, however, based on a clash between the design of the "say-as"
construct and the goals set out in the current draft of the XML
Accessibility Guidelines (XAG), which has been discussed there quite a bit.

References:

"say-as" element
http://www.w3.org/TR/speech-synthesis/#S2.1.4

XAG Checkpoint 2, (have a model) and 4 (export the model):
http://www.w3.org/TR/xag#g2_0
http://www.w3.org/TR/xag#g4_0

</note>

The say-as element has attributes names "interpret-as" and "format."
However, the format specification neither defines these in such as way
as to create an interoperable information capture, nor does it require
the user of these attributes to do so in user-provided declarations.

The examples given of ordinal numbers and telephone numbers, on the other
hand, are clear examples of information elements with well-posed value
domains and application semantics.  This categorical information would be
valuable in a variety of adaptation to meet the needs of people with
disabilities, such as re-representing the information in modes other than
speech.

A XAG-compliant dialect would ensure that all interpret-as and format
values assigned to speak-able strings were machinably connected with
machine-comprehensible expressions of the proper characteristics
where machine-comprehensible expression of such characteristics was
readily achievable.

In fact, in the use cases suggested in the examples, the 'format' attribute
is used for indicating the semantic variety involved, while the
"interpret-as" attribute is used for the more presentation-level encoding of
these information items in overloads of the integers.

This element in violation of XAG Checkpoint 4.9, "Do not assume that element
or attribute names provide any information about element semantics."  It is
ironically so, in that the name at least for the 'format' attribute is the
opposite of its suggested usage.

There is no semantics actually defined for these attributes, except for
possible heuristic values which are clearly only understood within the
working group, as they in some cases are the reverse of common usage.  These
two attributes are semantically as specific as the html:any.class attribute,
but named in a way as to appear more specific although they are not.  The
language would be better off to stick with .class as in HTML if there in no
semantic backup for the values applied under these names, but the format
should set up mechanisms for backups as to the sense of the values of marks
which guide the same rendering decisions as here, and not leave these as
bare user-defined strings.

This syntax, or the HTML-like .class attribute syntax, could perhaps be
characterized in metadata and brought up to a level of definition meeting
XAG Guideline 4.

On the other hand, the information to be conveyed by markup with this
element could be spelled out in the metadata section.

In future production use the information that "say-as" is designed to denote
should mostly be handled by lexicon references, but the lexicon standard is
not there yet.  But a dc.relation.conformsTo link to a type declaration in
the XSD type system would be a feasible form of inline lexicon support for
the kinds of characteristics that seem to be targeted here.

Please consider ways that we can get the value domain and appropriate
application information that goes with these information elements better
exposed for processing in adaptive applications.

There is information that this element is trying to capture that is very
important in speech rendering of texts.  It is just not modeled well in this
language feature.  The markup should focus on the content species.  An
ordinal number, for example, is a well known conceptual species; there are
multiple definitions in standards that one could refer to, to convey its
nature.  This will give the speech generation module what it needs by way of
decision basis in order to inflect the voicing appropriately.  An un-defined
user-inserted string doesn't establish a basis for interoperation with respect
to the applicable semantics.  This has been clear from the history of 'rel'
and 'rev' on html:link and similar attributes elsewhere.

Al
Received on Thursday, 23 January 2003 22:20:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 October 2006 12:48:56 GMT