[48 hour] DRAFT Last Call comment Re: [XML 1.1] Allowable element names from Al Gilman on 2002-07-04 (wai-xtech@w3.org from July 2002)

From: Al Gilman <asgilman@iamdigex.net>
Date: Thu, 04 Jul 2002 14:50:49 -0400
To: WAI Cross-group list <wai-xtech@w3.org>
Cc: Rick Jelliffe <ricko@topologi.com>
Message-Id: <5.1.0.14.2.20020704135231.0212c190@pop.iamdigex.net>
<callForConsensus
class="48hour /WAI/PF">

As Chair of the PF working group, I am [nearly] prepared to declare that PF has 
reached consensus on the following comment to the XML 1.1 Last Call draft.  
This makes our comment late, but as there is stuff that needs to be said we will 
go ahead and say it.

Please comment on this DRAFT comment on the <wai-xtech@w3.org> list, unless for some
odd reason you really need to rely in your argument on something that is visible in
Member space and not visible in public.

The '48 hour' aspect of this request for comments is as follows:  comments posted
after 2002-07-06 19:00Z may be overlooked in the formulation of a PF comment to the
XML 1.1 Last Call.

Al

</callForConsensus>

<comment
class="DRAFT"
rdfs:about="http://www.w3.org/TR/xml11/">

** name characters in XML 1.1 and access to content creation for people with disabilities **

** summary

[expansions of these three points appear in a following 'details' section]

* The use of arbitrary Unicode characters as name characters in XML is quite likely to impose
serious hardships on people with visual disabilities wishing to create document instances
and application-specific dialects in XML 1.1.  Pardon the double negative, but one of
our conclusions on reviewing this point is that it is not a non-issue.  Call this "odd
characters may make inaccessible names"

* At the level of creating a markup vocabulary or 'dialect' of XML, a good practice to follow
would be to adopt some actual natural language as the base for symbol creation, and as
symbols in the vocabulary either actual real words from that language or plausible 
abbreviations and agglutenations from the natural vocabulary of that language.  Call this
"good-symbols BCP at dialect level"

* The discussion was inconclusive, however, as to what if any _character level constraints_
were appropriate to apply against _name characters_, globally in all XML 1.1.  Call this
"no clear cut."

** discussion of details

* Odd characters may make inaccessible names:

The use case is for a person who is blind or has seriously low vision to be able to edit
XML document instances, DTDs and schemas.  The presumed level of automation is an editor
which internalizes the rules of well-formed XML and is otherwise transparent to the text.
So symbols in the XML used as names of element types and attribute types come through the
XML-recognition of the editor as verbatim sequences of characters from the document 
caracter set.  For these symbols to function as symbols in the editing of document instances
they should be speak-able as words in the ideal for the speech-output user, and transliteratable into braille characters for the braille-output user.  

There is a fall-back to 'spell' mode in the text-to-speech but this is significantly 
more tedious for symbols that are long enough and could become a ability-to-do-job 
make-or-break consideration.  There will be some of each in our working model of the 
users for this "walkthough test case."  While some very popular XML dialects such as 
XHTML Basic will have editors available with higher levels of recognition built in, 
XML as a language-building technology is not whole unless this level of editing is 
available.  In the life cycle of every dialect it will be needed in  some stage of 
the workflow, and this stage of the workflow should be open to participation
by people with these visual-ability conditions.  

People with disabilities tend to be operating off a technology base that is one generation behind.  People with two disabilities are often two generations behind.

Note that the standard practice in Braille transcription is to 'bleep' out un-transliteratable
characters with some wild card expression such as [***].  All bleeps so generated will
appear the same, so the distinction among symbols may be totally lost.  Without adding
a schema-aware editor that is substiting from schema annotations on the fly.  It is not 
clear that this extra level of investment in editor internals is a reasonable expectation
for editors for such a small market.

* Good-symbols BCP at dialect level:

Dialects which use real words and abbreviations or agglutenations of real words will 
transform gracefully under text-to-speech and Braille transliteration.

It might appear good to assert this guideline at the level of standards or guidelines
for dialect definition, roughly at the level of the XML Accessibility Guidelines.

However, the status of such guidelines in the W3C opus is unclear.  The XAG may be put on
a Recommendation Track in the re-chartering of the PF Working Group, but this may not
be assumed.

So things that are properly done at the lower level in XML itself with the 1.1 revision 
*should be done there* and not wait for a "maybe we will publish something at a higher
level.

* No clear cut:

In some applications, mathematical symbols and music notes may indeed be apt mnemonics
for element types, where the element semantics lines up with a single character symbol
or frequently encountered cluster such as B-flat.  In this sense the definition of a 
naturally occurring vocabulary has to be regarded as an extand domain of discourse and
not strictly a language which is natural in the sense that it is the first language
of some speaker group.

Control characters would seem to be pretty bad from a broad base of applications.

But we can't necessarily eliminate all punctuation.  If we did we could miss the 
opportunity to do agglutenations in some languages.  Camel Case only works in caseful 
languages.  IIRC there are languages where word boundaries require explicit word-break 
characters, and these would be required to use that language as a base and a phrase as 
a symbol for an element or attribute type.

There are natural languages that don't have natural orthographic spellings.  These 
include sign languages and spoken languages for which writing is not common among the
speaker group.  These are languages for which the WAI seeks equal access but in these
cases it does not appear that the well-formed-XML-editor use-case can be achieved.
In this case natural language expressions will _have_ to be associated with the XML 
symbols by formalized indirection, such as through the annotation facilities in XSD.

This is a long-winded explanation of why, in our consideration of this issue, no 
consensus emerged for any given name-character admissible set.

** Background:

Please use the following references to review the discussion behind this comment:

 http://lists.w3.org/Archives/Public/wai-xtech/2002Jun/thread.html#17

(Member access only)
 http://lists.w3.org/Archives/Member/w3c-wai-pf/2002AprJun/thread.html#251

</comment>

At 08:15 PM 2002-06-19, Charles McCathieNevile wrote:

>XML 1.1 is in last call awaiting comments on or before 28 June:
>http://www.w3.org/TR/xml11/
>
>A question has been raised about whether there should be restrictions on what
>characters can be used in element and attribute names, and if so what kind of
>restrictions.
>
>The issue comes about particularly when people are going to edit XML. If they
>can't determine the name of an element or an attribute (for example if it is
>a symbolic character or collection of them, rather than a recognisable word)
>then they will not be able to work with the XML language. For example, screen
>readers do not necessarily have a capacity to present math characters or
>"dingbats" - symbols like smiley faces that exist in unicode as characters,
>and music notes may not be meaningful to people who are Deaf. Likewise, it is
>important for international usage that arabic or chinese or thai characters
>can be used by people whose natural writing script is one of those (and so on
>for other scripts).
>
>Some thoughts have been suggested. Broadly there are a couple of different
>approaches, although there are also intermediary possibilities.
>
>1. There should be restrictions that require names to come from a single
>range of characters used in a single language, and should be based on
>meaningful words (this could be enforced by requiring a dictionary lookup).
>
>2. It is fine to use any characters, since authoring tools can allow the
>editor to assign their own version of the name for local use - i.e. doing a
>search and replace before beginning, or whenever an unusable name is
>encountered, and then convert those back to the required characters on
>saving.
>
>A possible intermediate position is that an XML language must have a schema
>which provides an annotation that can be used as an alternative name, or
>documentation so the authorr can understand the purpose of the element and
>provide a name useful to them.
>
>This has implications for and relationship to the XML accessibility
>Guidelines - http://www.w3.org/WAI/PF/XML - as well as for authoring tool
>accessibility guidelines and internationalisation.
>
>The PFWG has decided to continue its discussions in public, to enable the
>public working groups to easily join the discussion and see the issues.
>
>Cheers
>
>Charles McCN
Received on Thursday, 4 July 2002 14:51:30 UTC