- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Mon, 22 Aug 2022 09:04:24 -0600
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- Cc: public-ixml@w3.org
Steven Pemberton <steven.pemberton@cwi.nl> writes: > Alas! I had completely failed to see in the past that there is a > class LC! And the ixml rule for a class is: > -class: code. > @code: capital, letter?. > -capital: ["A"-"Z"]. > -letter: ["a"-"z"]. > Thus, our first bug... > Easiest fix is > -letter: [a-zA-Z]. I think it would probably be worthwhile being more explicit in the prose about (a) the fact that by "character category" we mean the short-hand values for the General_category property in the Unicode database, and the aliases defined by Unicode for sets of such values. Our bibliographic reference is explicit enough, I guess, but if I didn't already know what we meant, I don't know how easily I would infer it from the current text. (b) the fact that the set of characters matched by a character category in an ixml character set will vary depending on the version of Unicode supported by a processor. (c) whether all ixml processors are required to support Unicode 13.0 and only Unicode 13.0, or whether they may support other versions in addition or instead. (d) assuming that we want loose coupling with Unicode, not tight coupling, the advice that a conformance claim for an ixml processor should include information about which version of Unicode it supports. (And for that matter, which version of XML.) For what it's worth, 'LC' was introduced in version 8 of TR 44, published in 2012 with Unicode 6.1. That presumably explains why it's not in the XSD spec's list of character classes (based on 3.1). Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Monday, 22 August 2022 15:31:16 UTC