W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > October to December 2008

RE: Regex in Datatypes 2e

From: Michael Kay <mike@saxonica.com>
Date: Sat, 25 Oct 2008 09:41:42 +0100
To: "'James Clark'" <jjc@jclark.com>, <www-xml-schema-comments@w3.org>
Message-ID: <9592AF25D27E4AAEB1529F9B0F7C340C@Sealion>

These are known issues, though without very satisfactory resolutions (yet).
They are logged in the W3C Bugzilla database, but I can't get a connection
to the W3C server this morning, so I can't give you chapter and verse. 

> 
> 1.  The non-BMP blocks (such as Gothic) seem to have 
> disappeared from the table of block names in F.1.1 in the 2nd edition.

Michael Sperberg-McQueen diligently investigated the history of this and
came to the conclusion that they must have been omitted as a result of an
editorial error rather than as a conscious WG decision.
> 
> 2. There are a couple of notes saying "All .minimally 
> conforming. processors .must. support the character 
> properties/blocks defined in the version of [Unicode 
> Database] that is current at the time this specification 
> became a W3C Recommendation".  Does this mean the time of 
> publication of the 1st edition (3.1) or the time of 
> publication of the 2nd edition (4.0)?

The intent of the WG, I believe, was that the Schema spec should pick up new
revisions of Unicode automatically, without requiring the Schema spec itself
to be changed. However, I believe that at the time this policy decision was
made, the WG was unaware of the fact that this would invalidate existing
schemas because of changes in the block names (another serious one is that
"Greek" has become "GreekAndCoptic"). Another impact is that some characters
have been moved to a different character category. I don't think the WG has
yet found an answer to this problem.

Michael Kay
(personal response)  
Received on Saturday, 25 October 2008 08:42:24 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 6 December 2009 18:13:16 GMT