W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > April to June 2010

Re: [Bug 10008] New: Use of Unicode blocks that no longer exist in regular expressions.

From: Michael Kay <mike@saxonica.com>
Date: Thu, 24 Jun 2010 17:00:22 +0100
Message-ID: <4C238116.5080806@saxonica.com>
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
CC: oliver@cbcl.co.uk, www-xml-schema-comments@w3.org
I would think a reasonable approach is:

for every string that has ever been used as a block name in any version 
of Unicode, accept that string and bind it to the set of characters that 
it refers to in the latest version of Unicode in which it was used.

Michael Kay
Saxonica

On 24/06/2010 16:34, C. M. Sperberg-McQueen wrote:
> On 24 Jun 2010, at 08:11 , C. M. Sperberg-McQueen wrote:
>> ... section G.1.1 Character Class Escapes [says]
>>
>>    When the implementation supports multiple versions of the Unicode 
>> database,
>>    and they differ in salient respects (e.g. different properties are 
>> assigned
>>    to the same character in different versions of the database), then it
>>    is ·implementation-defined· which set of property definitions is used
>>    for any given assessment episode.
>>
>> ...
>>
>> XSD 1.1 requires you to document how you determine which version of
>> the database to use in interpreting block names.  It does not, as far
>> as I can see, require anything further.  (It does not, for example,
>> appear to require that you always use the same version within a given
>> validation, though as a user I think I'd rather that you did.)
>
> I should read more carefully.  The phrase "is used for any given
> assessment episode" does seem to convey the expectation that an
> implementation should interpret all regexes in a given validation
> according to the same version of the Unicode database.
>
> I'm still not sure that it explicitly *requires* it, though.  If
> for example two separately maintained schema documents assume
> different versions of the Unicode database -- one writes \p{IsGreek}
> and the other \p{IsGreekandCoptic}, say -- then it's hard to see
> how an implementation could limit itself to a single version of
> the database in a schema composed from those two schema documents.
> So I'd argue that it cannot and should not be *required*, though
> of course it's probably simpler all around if a single version
> of the database is used for any given validation.
>
> Sorry for missing this aspect of the issue in my earlier mail.
>
Received on Thursday, 24 June 2010 16:01:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 24 June 2010 16:01:01 GMT