Re: [48 hour] DRAFT Last Call comment Re: [XML 1.1] Allowable element names

At 08:33 AM 2002-07-05, Marja-Riitta Koivunen wrote:
>Does this mean that a group of Finnish users could not agree on using tags written in Finnish words with a-umlaut and o-umlaut?

Not in the slightest.

The purpose here is to say that Finnish words, complete with o-umlaut, are cool, 
and abbreviations of Finnish words that a Finnish speaker would often guess right 
are cool.  In an XML markup vocabulary where all the symbols make sense to a 
Finnish speaker.  Throwing in Finnish words in a markup vocabulary that is 
predominantly Urdu or Swahili in its mnemonic interpretation is not so cool.

On the other hand, that strings of dingbats that don't spell anything used in 
spoken Finnish,  or any other natural-conversation language that people use 
between people (without the intermediary of paper or computer), are not so cool.

Math and music are special cases because they reflect an advanced level of
education, but they are international.  So by their international readership
they build both a sizeable community of literates, and community bonds 
cutting across first-language divisions.  So we can't even say that vocabularies 
should be limited to languages that are naturally occurring first languages of 
speaker groups.

The "Silk Road" fair on the Mall this week in Washington DC has a slogan of
"Connecting Cultures, Creating Trust."  That's pretty blatant in its 
pursuit of understanding across groups who frequently encounter difficulty
trusting one another, in part because of differences in first language.

It is not accident that the activities in this fair are heavily involved
with crafts, food an music -- things that can be appreciated readily without a
language dependency.

The early normalisation issue is a detail.  This question asks "Should all XML vocabularies pre-normalise their symbols so that two vocabularies that use the same world will have the same UniCode character sequence, not one symbol representing 
the o-umlaut with one character and another representing the o-umlaut with two characters.

This will make equality of symbols for the reader and equality of symbols for the
computer be the same equality.  This is a good thing to achieve.

Compare with


>What if some of them had cognitive problems in learning other languages? My understanding is that Finns with visual problems already often have tools that can pronounce Finnish so the main problem is for English speaking people wanting to read these Finnish tags?
>Not that I'm totally against this but I don't think it is so straightforward from accessibility viewpoint unless your native tongue is English.
At 02:50 PM 7/4/2002 -0400, Al Gilman wrote:
class="48hour /WAI/PF"
>>As Chair of the PF working group, I am [nearly] prepared to declare that PF has
>>reached consensus on the following comment to the XML 1.1 Last Call draft.
>>This makes our comment late, but as there is stuff that needs to be said we will
>>go ahead and say it.
>>Please comment on this DRAFT comment on the <> list, unless for some
>>odd reason you really need to rely in your argument on something that is visible in
>>Member space and not visible in public.
>>The '48 hour' aspect of this request for comments is as follows:  comments posted
>>after 2002-07-06 19:00Z may be overlooked in the formulation of a PF comment to the
>>XML 1.1 Last Call.
>>** name characters in XML 1.1 and access to content creation for people with disabilities **
>>** summary
>>[expansions of these three points appear in a following 'details' section]
>>* The use of arbitrary Unicode characters as name characters in XML is quite likely to impose
>>serious hardships on people with visual disabilities wishing to create document instances
>>and application-specific dialects in XML 1.1.  Pardon the double negative, but one of
>>our conclusions on reviewing this point is that it is not a non-issue.  Call this "odd
>>characters may make inaccessible names"
>>* At the level of creating a markup vocabulary or 'dialect' of XML, a good practice to follow
>>would be to adopt some actual natural language as the base for symbol creation, and as
>>symbols in the vocabulary either actual real words from that language or plausible
>>abbreviations and agglutenations from the natural vocabulary of that language.  Call this
>>"good-symbols BCP at dialect level"
>>* The discussion was inconclusive, however, as to what if any _character level constraints_
>>were appropriate to apply against _name characters_, globally in all XML 1.1.  Call this
>>"no clear cut."
>>** discussion of details
>>* Odd characters may make inaccessible names:
>>The use case is for a person who is blind or has seriously low vision to be able to edit
>>XML document instances, DTDs and schemas.  The presumed level of automation is an editor
>>which internalizes the rules of well-formed XML and is otherwise transparent to the text.
>>So symbols in the XML used as names of element types and attribute types come through the
>>XML-recognition of the editor as verbatim sequences of characters from the document
>>caracter set.  For these symbols to function as symbols in the editing of document instances
>>they should be speak-able as words in the ideal for the speech-output user, and transliteratable into braille characters for the braille-output user.
>>There is a fall-back to 'spell' mode in the text-to-speech but this is significantly
>>more tedious for symbols that are long enough and could become a ability-to-do-job
>>make-or-break consideration.  There will be some of each in our working model of the
>>users for this "walkthough test case."  While some very popular XML dialects such as
>>XHTML Basic will have editors available with higher levels of recognition built in,
>>XML as a language-building technology is not whole unless this level of editing is
>>available.  In the life cycle of every dialect it will be needed in  some stage of
>>the workflow, and this stage of the workflow should be open to participation
>>by people with these visual-ability conditions.
>>People with disabilities tend to be operating off a technology base that is one generation behind.  People with two disabilities are often two generations behind.
>>Note that the standard practice in Braille transcription is to 'bleep' out un-transliteratable
>>characters with some wild card expression such as [***].  All bleeps so generated will
>>appear the same, so the distinction among symbols may be totally lost.  Without adding
>>a schema-aware editor that is substiting from schema annotations on the fly.  It is not
>>clear that this extra level of investment in editor internals is a reasonable expectation
>>for editors for such a small market.
>>* Good-symbols BCP at dialect level:
>>Dialects which use real words and abbreviations or agglutenations of real words will
>>transform gracefully under text-to-speech and Braille transliteration.
>>It might appear good to assert this guideline at the level of standards or guidelines
>>for dialect definition, roughly at the level of the XML Accessibility Guidelines.
>>However, the status of such guidelines in the W3C opus is unclear.  The XAG may be put on
>>a Recommendation Track in the re-chartering of the PF Working Group, but this may not
>>be assumed.
>>So things that are properly done at the lower level in XML itself with the 1.1 revision
>>*should be done there* and not wait for a "maybe we will publish something at a higher
>>* No clear cut:
>>In some applications, mathematical symbols and music notes may indeed be apt mnemonics
>>for element types, where the element semantics lines up with a single character symbol
>>or frequently encountered cluster such as B-flat.  In this sense the definition of a
>>naturally occurring vocabulary has to be regarded as an extand domain of discourse and
>>not strictly a language which is natural in the sense that it is the first language
>>of some speaker group.
>>Control characters would seem to be pretty bad from a broad base of applications.
>>But we can't necessarily eliminate all punctuation.  If we did we could miss the
>>opportunity to do agglutenations in some languages.  Camel Case only works in caseful
>>languages.  IIRC there are languages where word boundaries require explicit word-break
>>characters, and these would be required to use that language as a base and a phrase as
>>a symbol for an element or attribute type.
>>There are natural languages that don't have natural orthographic spellings.  These
>>include sign languages and spoken languages for which writing is not common among the
>>speaker group.  These are languages for which the WAI seeks equal access but in these
>>cases it does not appear that the well-formed-XML-editor use-case can be achieved.
>>In this case natural language expressions will _have_ to be associated with the XML
>>symbols by formalized indirection, such as through the annotation facilities in XSD.
>>This is a long-winded explanation of why, in our consideration of this issue, no
>>consensus emerged for any given name-character admissible set.
>>** Background:
>>Please use the following references to review the discussion behind this comment:
>>At 08:15 PM 2002-06-19, Charles McCathieNevile wrote:
>>>XML 1.1 is in last call awaiting comments on or before 28 June:
>>>A question has been raised about whether there should be restrictions on what
>>>characters can be used in element and attribute names, and if so what kind of
>>>The issue comes about particularly when people are going to edit XML. If they
>>>can't determine the name of an element or an attribute (for example if it is
>>>a symbolic character or collection of them, rather than a recognisable word)
>>>then they will not be able to work with the XML language. For example, screen
>>>readers do not necessarily have a capacity to present math characters or
>>>"dingbats" - symbols like smiley faces that exist in unicode as characters,
>>>and music notes may not be meaningful to people who are Deaf. Likewise, it is
>>>important for international usage that arabic or chinese or thai characters
>>>can be used by people whose natural writing script is one of those (and so on
>>>for other scripts).
>>>Some thoughts have been suggested. Broadly there are a couple of different
>>>approaches, although there are also intermediary possibilities.
>>>1. There should be restrictions that require names to come from a single
>>>range of characters used in a single language, and should be based on
>>>meaningful words (this could be enforced by requiring a dictionary lookup).
>>>2. It is fine to use any characters, since authoring tools can allow the
>>>editor to assign their own version of the name for local use - i.e. doing a
>>>search and replace before beginning, or whenever an unusable name is
>>>encountered, and then convert those back to the required characters on
>>>A possible intermediate position is that an XML language must have a schema
>>>which provides an annotation that can be used as an alternative name, or
>>>documentation so the authorr can understand the purpose of the element and
>>>provide a name useful to them.
>>>This has implications for and relationship to the XML accessibility
>>>Guidelines - - as well as for authoring tool
>>>accessibility guidelines and internationalisation.
>>>The PFWG has decided to continue its discussions in public, to enable the
>>>public working groups to easily join the discussion and see the issues.
>>>Charles McCN

