Re: First bug in spec... from C. M. Sperberg-McQueen on 2022-08-22 (public-ixml@w3.org from August 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Mon, 22 Aug 2022 09:04:24 -0600
To: Steven Pemberton <steven.pemberton@cwi.nl>
Cc: public-ixml@w3.org
Message-ID: <87r1189txr.fsf@blackmesatech.com>

Steven Pemberton <steven.pemberton@cwi.nl> writes:

>  Alas! I had completely failed to see in the past that there is a
>  class LC! And the ixml rule for a class is:

> -class: code.
>          @code: capital, letter?.
>       -capital: ["A"-"Z"].
>        -letter: ["a"-"z"].

>  Thus, our first bug...

> Easiest fix is

>    -letter: [a-zA-Z].

I think it would probably be worthwhile being more explicit in the prose
about

  (a) the fact that by "character category" we mean the short-hand
  values for the General_category property in the Unicode database, and
  the aliases defined by Unicode for sets of such values. Our
  bibliographic reference is explicit enough, I guess, but if I didn't
  already know what we meant, I don't know how easily I would infer it
  from the current text.

  (b) the fact that the set of characters matched by a character
  category in an ixml character set will vary depending on the version
  of Unicode supported by a processor.

  (c) whether all ixml processors are required to support Unicode 13.0
  and only Unicode 13.0, or whether they may support other versions in
  addition or instead.

  (d) assuming that we want loose coupling with Unicode, not tight
  coupling, the advice that a conformance claim for an ixml processor
  should include information about which version of Unicode it
  supports.  (And for that matter, which version of XML.)

For what it's worth, 'LC' was introduced in version 8 of TR 44,
published in 2012 with Unicode 6.1.  That presumably explains why it's
not in the XSD spec's list of character classes (based on 3.1).

Michael

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Monday, 22 August 2022 15:31:16 UTC