- From: Tim Bray <tbray@textuality.com>
- Date: Sun, 13 Jun 1999 15:10:48 -0700
- To: John Stracke <francis@thibault.org>, xml-editor@w3.org
At 05:13 PM 6/13/99 -0400, John Stracke wrote: >I'm building an XML parser, and I'm somewhat confused by the >spec's productions Letter and Digit. My concern is that, if >a new character set is defined next week, then existing XML >parsers won't consider any of its characters to be Letters >or Digits You've put your finger on one of the real hard problems with XML. Production [2], for Character, makes it clear that you can use, as a character, pretty well anything that the appropriate committees add to Unicode. On the other hand, every time they add a new character set, it will in general contain some things that fall under "letter" and others that shouldn't. Note that the XMl spec outlines the algorithm that we used to identify what we consider a "letter"; is this extensible to new character sets? At the moment we just don't know. For what it's worth, XML 1.0 is 100% totally clear on what's a letter and what isn't, and includes most of the languages that most people are going to be using... but there's no doubt that there's a problem lurking out there that's going to have to be solved sometime. Fortunately, the key committees both in the XML, IETF, and Unicode spaces know about the problem and are already worrying. -Tim
Received on Sunday, 13 June 1999 18:10:53 UTC