Re: Name Constraints should be kept in XML 1.1 from John Cowan on 2002-02-04 (www-xml-blueberry-comments@w3.org from February 2002)

From: John Cowan <cowan@mercury.ccil.org>
Date: Sun, 3 Feb 2002 20:35:28 -0500 (EST)
To: Rick Jelliffe <ricko@allette.com.au>
CC: www-xml-blueberry-comments@w3.org
Message-Id: <E16XY2a-0002et-00@mercury.ccil.org>
Rick Jelliffe scripsit:

> Since there has been no response to my direct email to the WG several months
> ago, I assume it has fallen through the cracks and I hope the WG will forgive
> me for requesting that the issues raised in that email will find their way onto the
> issues list.

The fault is entirely mine, Rick.  I am tasked with writing a reply,
and have shamefully neglected it.  The WG is not at fault.
What you are now reading is entirely personal.

> There seems to be two rationales for removing the name restrictions in XML.
> First, to decouple XML from the particular version of Unicode (supposedly
> bringing in, thereby, new scripts), and second to simplify XML.

AFAIK the first rationale is the operative one; the second is a side
issue, and not even true (given that we expect XML 1.1 parsers to
to enforce XML 1.0 rules in 1.0 documents).

> The cost is, of course, that XML documents with mislabelled encodings are 
> less likely to be caught.

But only by the sheerest chance.  If a Russian document is marked up 
using non-Russian tags (as probably the majority of Russian XML
documents are), then the difference between KOI, Windows, and
ISO encodings will not be caught by the name rules -- but it will
lead to the text being utterly unintelligible.

> Is there another alternative which does not throw the baby out with the bathwater?

Not in the general case.  But anything that does semantic interpretation
will surely fail on miscoded documents even if no name rules are
in place at all.  How likely is it that a name meaningful to an
application will be accidentally transcoded into another such name?

> In particular, I suggest the WG consider or re-consider the following two part solution:
> 
> 1) "A name error MUST be reported as a validity error. A name error MAY
>   be reported as a WF error."

I think it would cause a very serious problem for parsers to
disagree about what is and what is not WF -- which is to say,
what is and what is not XML.  We are already going to induce
this problem by moving to XML 1.1, but at least that should be
temporary.

> 2) "The naming rules should make use of the Unicode identifier properties.
> with whatever changes are needed, rather than being enumerated.
> 
> John Cowan's excellent work a year ago on this should be followed.
> The WG should follow the Unicode properties: it is ironic to discard
> them in the name of increased Unicode support.

The difficulty here (and I myself now support the WG's position)
is that we believe the resistance to introducing XML 1.1 will be
quite large enough, but we may be able to push it through.
But when Unicode 4.0 comes out, if we then need XML 1.2, the
resistance to change will be all the larger, and so on as increasingly
uncommercial scripts are added.  Whoever is at the end of the list
may wind up being *permanently* locked out because the community
will no longer accept changes.

There are two angles to this: changing the XML specification repeatedly,
and changing implementations repeatedly.  A clever specification
such as the one I originally proposed, could be completely stable
at the W3C level, leaving all the change to occur at the Unicode
level alone (aka "getting W3C out of the Unicode business").  But
that would not solve the problem of implementing and, above all,
*deploying* the revisions to millions of embedded parsers.
Admittedly, these are mostly non-validating, but in that case,
what becomes of the supposed essential gimmick for detecting
bad encodings?  The bulk of all parsers will not enforce it.

-- 
John Cowan           http://www.ccil.org/~cowan              cowan@ccil.org
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
        --_The Hobbit_
Received on Sunday, 3 February 2002 20:35:21 UTC