- From: John Cowan <cowan@mercury.ccil.org>
- Date: Sun, 3 Feb 2002 20:35:28 -0500 (EST)
- To: Rick Jelliffe <ricko@allette.com.au>
- CC: www-xml-blueberry-comments@w3.org
Rick Jelliffe scripsit: > Since there has been no response to my direct email to the WG several months > ago, I assume it has fallen through the cracks and I hope the WG will forgive > me for requesting that the issues raised in that email will find their way onto the > issues list. The fault is entirely mine, Rick. I am tasked with writing a reply, and have shamefully neglected it. The WG is not at fault. What you are now reading is entirely personal. > There seems to be two rationales for removing the name restrictions in XML. > First, to decouple XML from the particular version of Unicode (supposedly > bringing in, thereby, new scripts), and second to simplify XML. AFAIK the first rationale is the operative one; the second is a side issue, and not even true (given that we expect XML 1.1 parsers to to enforce XML 1.0 rules in 1.0 documents). > The cost is, of course, that XML documents with mislabelled encodings are > less likely to be caught. But only by the sheerest chance. If a Russian document is marked up using non-Russian tags (as probably the majority of Russian XML documents are), then the difference between KOI, Windows, and ISO encodings will not be caught by the name rules -- but it will lead to the text being utterly unintelligible. > Is there another alternative which does not throw the baby out with the bathwater? Not in the general case. But anything that does semantic interpretation will surely fail on miscoded documents even if no name rules are in place at all. How likely is it that a name meaningful to an application will be accidentally transcoded into another such name? > In particular, I suggest the WG consider or re-consider the following two part solution: > > 1) "A name error MUST be reported as a validity error. A name error MAY > be reported as a WF error." I think it would cause a very serious problem for parsers to disagree about what is and what is not WF -- which is to say, what is and what is not XML. We are already going to induce this problem by moving to XML 1.1, but at least that should be temporary. > 2) "The naming rules should make use of the Unicode identifier properties. > with whatever changes are needed, rather than being enumerated. > > John Cowan's excellent work a year ago on this should be followed. > The WG should follow the Unicode properties: it is ironic to discard > them in the name of increased Unicode support. The difficulty here (and I myself now support the WG's position) is that we believe the resistance to introducing XML 1.1 will be quite large enough, but we may be able to push it through. But when Unicode 4.0 comes out, if we then need XML 1.2, the resistance to change will be all the larger, and so on as increasingly uncommercial scripts are added. Whoever is at the end of the list may wind up being *permanently* locked out because the community will no longer accept changes. There are two angles to this: changing the XML specification repeatedly, and changing implementations repeatedly. A clever specification such as the one I originally proposed, could be completely stable at the W3C level, leaving all the change to occur at the Unicode level alone (aka "getting W3C out of the Unicode business"). But that would not solve the problem of implementing and, above all, *deploying* the revisions to millions of embedded parsers. Admittedly, these are mostly non-validating, but in that case, what becomes of the supposed essential gimmick for detecting bad encodings? The bulk of all parsers will not enforce it. -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_
Received on Sunday, 3 February 2002 20:35:21 UTC