- From: John Cowan <cowan@mercury.ccil.org>
- Date: Tue, 10 Jul 2001 09:11:24 -0400 (EDT)
- To: www-xml-blueberry-comments@w3.org
----- Forwarded message from Tim Bray ----- From xml-dev-errors@lists.xml.org Tue Jul 10 00:39:56 2001 Envelope-to: cowan@mercury.ccil.org Received: from one.elistx.com ([209.116.252.130]) by mercury.ccil.org with esmtp (Exim 3.12 #1 (Debian)) id 15JpJU-0000rN-00 for <cowan@mercury.ccil.org>; Tue, 10 Jul 2001 00:39:56 -0400 Received: from CONVERSION-DAEMON.eListX.com by eListX.com (PMDF V6.0-24 #44856) id <0GG800G01Q2YE3@eListX.com> for cowan@mercury.ccil.org; Tue, 10 Jul 2001 00:35:41 -0400 (EDT) Received: from ELIST-DAEMON.eListX.com by eListX.com (PMDF V6.0-24 #44856) id <0GG800G04Q2VDW@eListX.com> (original mail from tbray@textuality.com); Tue, 10 Jul 2001 00:35:20 -0400 (EDT) Received: from CONVERSION-DAEMON.eListX.com by eListX.com (PMDF V6.0-24 #44856) id <0GG800G01Q2VDS@eListX.com> for xml-dev@elist.lists.xml.org (ORCPT xml-dev@lists.xml.org); Tue, 10 Jul 2001 00:35:19 -0400 (EDT) Received: from DIRECTORY-DAEMON.eListX.com by eListX.com (PMDF V6.0-24 #44856) id <0GG800G01Q2UDR@eListX.com> for xml-dev@elist.lists.xml.org (ORCPT xml-dev@lists.xml.org); Tue, 10 Jul 2001 00:35:18 -0400 (EDT) Received: from mail.dev.antarcti.ca (gt.antarcti.ca [209.17.183.233]) by eListX.com (PMDF V6.0-24 #44856) with ESMTP id <0GG800CMKQ2TMZ@eListX.com> for xml-dev@lists.xml.org; Tue, 10 Jul 2001 00:35:18 -0400 (EDT) Received: from rune.antarcti.ca (dev1.dev.antarcti.ca [10.1.1.8]) by mail.dev.antarcti.ca (Postfix) with ESMTP id E36CD10A23 for <xml-dev@lists.xml.org>; Mon, 09 Jul 2001 21:33:34 -0700 (PDT) Date: Mon, 09 Jul 2001 21:33:12 -0700 From: Tim Bray <tbray@textuality.com> Subject: Blueberry/Unicode/XML In-reply-to: <3B49E743.5042FDCD@mitre.org> X-Sender: tbray@pop.intergate.ca To: xml-dev@lists.xml.org Message-id: <5.1.0.14.2.20010709211010.02557b20@pop.intergate.ca> X-Mailer: QUALCOMM Windows Eudora Version 5.1 List-Owner: <mailto:xml-dev-help@lists.xml.org> List-Post: <mailto:xml-dev@lists.xml.org> List-Subscribe: <mailto:xml-dev-request@lists.xml.org?body=subscribe> List-Unsubscribe: <mailto:xml-dev-request@lists.xml.org?body=unsubscribe> List-Archive: <http://lists.xml.org/archives/xml-dev> List-Help: <http://lists.xml.org/elists/admin_email.shtml>, <mailto:xml-dev-request@lists.xml.org?body=help> Boy, this one's tough. I buy neither Elliote's assertion that changing XML is unthinkable, nor John Cowan's assertion that the depth of the cultural affront to users of pre-Unicode-3.1 languages is so high as to outweigh consideration of cost. I just went and reviewed the Blueberry requirements at http://www.w3.org/TR/xml-blueberry-req and I'm not very comfy with them. There is repeated and specific reference to the problem being that posed by Unicode 3.1. The problem isn't 3.1, it's that Unicode is an unfinished standard that continues to grow actively, whereas it would be nice if we could declare XML syntax finished and go back to our plows. XML 1.0 took a design decision in favor of enumeration of name characters, simply because the alternative - outsourcing the problem to the Unicode/ISO10646 process - had two problems: (a) We didn't know them well enough to trust them, and (b) writing a satisfying set of rules for XML name chars based solely on Unicode metadata is pretty hard. The force of argument (b) is unabated. (a) seems less of a worry now simply because the Unicode and XML gangs have gotten pretty comfy with each other. But I do have a worry at the back of my mind whether the W3C *institutionally* ought to trust the consortium *institutionally* with something of this magnitude. And what happens of ISO and Unicode stop getting along one of these centuries, whose side is XML on? A few weeks ago, I was in favor of leaving it the way it is, but only by about 55-45. I found the most convincing argument on the other side was the person who postulated a Khmer user typing away in emacs and having a disconnect because there are lots of characters they can use for people's names but not as attribute names. On the other hand, this problem is not unique to Khmer - just ask Mr. O'Hara. And the notion of having a single monolithic XML whose interoperability, while not perfect, is pretty $#!%* good, partially based on those unwieldy character-class productions, is something that it will hurt to lose. And it is a reasonable position to say "The markup name character class snapshot was based on Unicode 2.0, sorry 'bout that." Realistically, there are 3 options: 1. Leave it the way it is. 2. Do Blueberry and then repeat the process for Unicode 3.2 and 4.0 and so on every couple of years forever. 3. Bite the bullet, write the rules in terms of Unicode metadata and go to a pure use-by-reference architecture, probably adding a syntactic signal to reference the Unicode version number. I think (3.) will prove to be really hard to do well - and then the Unicode metadata fields might get changed and screw it all up. I think (2.) is not unreasonable, but has the institutional disadvantage that the XML standardization effort has to become an ongoing process ad infinitum. I still go for (1.). My opposition to NEL has hardened, because of a strong fear that this one will cause real wreckage on a widespread basis, not just in linguistic corner cases. But I really can't see how anyone can get behind any of these positions and feel entirely comfortable with where they find themselves standing. I sure don't. -Tim ------------------------------------------------------------------ The xml-dev list is sponsored by XML.org, an initiative of OASIS <http://www.oasis-open.org> The list archives are at http://lists.xml.org/archives/xml-dev/ To unsubscribe from this elist send a message with the single word "unsubscribe" in the body to: xml-dev-request@lists.xml.org ----- End of forwarded message from Tim Bray ----- -- John Cowan cowan@ccil.org One art/there is/no less/no more/All things/to do/with sparks/galore --Douglas Hofstadter
Received on Tuesday, 10 July 2001 09:11:28 UTC