- From: Tim Bray <tbray@textuality.com>
- Date: Fri, 14 Dec 2001 16:56:34 -0800
- To: www-xml-blueberry-comments@w3.org
1. The principle of decoupling the XML spec from successive revisions of Unicode is the only sensible way forward. 2. If no consensus can be built around the details of this set of changes, it would be acceptable to declare defeat and go on with XML 1.0 2nd ed as-is. This would be a regrettable outcome but not fatal at a deep level. 3. Issue 18: The costs of allowing #x1-#x1F appear to me to exceed the benefits. Among other things, many of these ASCII control chars, despite being several decades old, have little consensus concerning their semantics, e.g. EOT and EOM (#x3 and #x4). I think from the XML point of view these things are actively pernicious; specifically the notion that semantics are embedded in characters rather than being expressed by markup. The case of "textual content that may contain such characters (but typically does not)" is pretty non-convincing. In *many* cases the occurrence of these characters is evidence of an error. 4. Issue 21: The cost of allowing null bytes in XML content is very high and the benefits hard to understand. 5. I strongly feel that #x85 (NEXT LINE) should not be added to the S production. The reason is a simple cost-benefit analysis; the proportion of computing installations where this is an issue is not large and is shrinking as a proportion of the infrastructure. Supporting this change imposes significant conversion costs on the rest of the world; the total global net cost would be significantly less if the mainframe software infrastructure took the necessary corrective measures to deal with XML 1.0 as specified. 6. I strongly feel, even more so than in the case of #x85, that #x2028 is inappropriate for inclusion in S. Here are some reasons: - If LINE SEPARATOR is to be included, why not the many other Unicode characters with spacing semantics? A coherent explanation needs to be provided on this point and I am unconvinced that one exists. - This would be the only core XML syntax character that can't fit in a byte. This would complicate several automaton-driven parser construction strategies. One of the key design goals of XML is to make programmers' lives simpler, so this objection should have weight. - "For completeness" is a really flimsy argument. 7. In [4], #x37a is included, which is a combining character and shouldn't be in NameStart 8. In [4], #xf7 is included (division sign), but the rest of the mathematical operators (starting at #x2200) are excluded. 9. The inclusion of a block #x202A-#218f is kind of puzzling... this is in the middle of one of the punctuation blocks, and the first few chars seem really unsuitable. What's the intent... wanting to include the currency symbols? This definitely needs some explanation. 10. There are some problems in the #x2800-#xD7FF block. Do we really want CJK radicals (#x2e80...), compatibility Jamo, ideographic description chars, and so on? 11. SHould that block end at #xD7aF or #xD7FF? 12. [#xFDE0-#xFFEF] includes the private use area and lots of compatibility characters which XML 1.0 actually deprecates for use at all, let alone as names. This is astounding and needs some defense. If this is OK, why not throw in all the punctuation? 13. What's wrong with ASCII digits as name start chars, given that all sorts of other digits are going in? 14. There really needs to be some deep discussion in this document of why this alternative was chosen. When I look at some of the wildly unlikely things that are allowed to appear in names, the obvious question is why not rely on the Unicode properties database. 15. Issue 11: I can see both sides of this question. My intuition is that the computational cost of doing this is unacceptably high for high-throughput applications of XML, but we need some research to establish if this is the case. If it can be done cheaply and compactly, it's probably a good idea. -- Cheers, Tim Bray, Founder, Antarcti.ca Systems +1-604-873-6100 (o) +1-604-785-8532 (m) http://antarcti.ca http://map.net
Received on Friday, 14 December 2001 19:56:44 UTC