RE: Comment on XML Schema 1.1 from Michael Kay on 2009-05-14 (www-xml-schema-comments@w3.org from April to June 2009)

From: Michael Kay <mike@saxonica.com>
Date: Thu, 14 May 2009 12:00:13 +0100
To: <henrik@juul-nyholm.org>
Cc: <www-xml-schema-comments@w3.org>
Message-ID: <799CB938EAAA477094E63762DB9E27D9@Sealion>
A personal response: the WG will give a formal response in due course.

Thanks for your detailed reading of the spec. There are some interesting
ideas here. Unfortunately the work is not really at a requirements-gathering
phase: XSD 1.1 requirements were agreed back in 2003 and we're just
finishing the spec to meet those requirements. However favourably the WG
views some of these, if we kept holding back the spec to accommodate more
good ideas, we would never finish. 

> 
> NAMESPACES
> (Affects both the XML Schema and the XML standard itself.)

Indeed: and the place for rules like this is probably in the XML Namespaces
spec, not in XML Schema. I don't think individual XML vocabularies (of which
XSD is one) should place constraints on how namespaces are declared; that
would be incorrect layering of the stack.
> 
> RESOLVING LINKED XML SCHEMAS
> 
> Local targetNamespace declarations
> By allowing local target namespace declarations it should be 
> possible to merge all linked XML Schemas (linked by include, 
> import and/or redefine) into only one valid XML Schema.

In principle I personally agree that it would be a good idea not to have
such a strong link between schema documents and namespaces, but I know there
are others that would differ on this. Whatever the merits, it's a new
requirement that's out of scope for the current round.
> 
> Value patterns
> It should be allowed to add more than one value pattern (reg. 
> exp) as facet to a restricted simple type. 

You're probably aware that you can define more than one pattern and they are
treated as alternatives. It might have been better if it had been defined so
they were and'ed together; but it's not a very common requirement and
there's a workaround by using multiple steps of restriction.
> 
> Resolving simple types
> It should be made explicit that xs:anySimpleType and 
> xs:anyAtomicType takes any value facet, grouped by xs:choice 
> in non-numeric, numeric and date facets.

I'm sorry, I don't understand what you are proposing here.

If anything, the spec has gone the other way, and said that all user-defined
atomic types must be defined as restrictions of a primitive type (and not
directly as a restriction of xs:anySimpleType).

> SIMPLE TYPES
> 
> Reg. exp. for built-in simple types
> The built-in simple types should all be normative defined as 
> regular expressions (together with text definitions), making 
> the validation tools more conformant.

Defining a regular expression that captures all values in the lexical space
of xs:long is possible, but it's a grotesque regex. Defining it using
minInclusive and maxInclusive is so much easier. Why do you think a regex
would be better?
> 
> Canonical and lexical definitions
> The built-in simple types should only have one definition. It 
> is NOT the task of XML Schema to make standards for 
> presentation of data, which is a far more complex task than 
> covered by XML Schema 1.0 or 1.1, and out of scope. The 
> double definitions make confusions about what is really the 
> correct definition.

I'm not sure what you're asking for here.
> 
> Simple type hierarchy
> The simple type hierarchy should be cleaned up and reflect 
> the logical relations instead of the current mixture of 
> logical relations and "historical" relations.

Unfortunately (a) there are many views on which relations are logical (for
example, whether xs:anyURI should be a restriction of xs:string) and it's
possible to get into endless metaphysical debates about the one true type
hierarchy; and (b) there's a strong presumption in favour of backwards
compatibility - that is, leaving it alone rather than making gratuitous
improvements that will impose transition costs on users.
> 
> Boolean type
> The built-in type 'boolean" should restricted only to 
> accepted 'true' and 'false', making the type more stringent.

Removing the values "0" and "1" would be a serious backwards
incompatibility.

And of course you can restrict the values to true/false using a pattern.

Personally (if we were entertaining new requirements) I would like to see
facets that enable user-defined lexical representations, such as yes/no,
ja/nein, or on/off.
> 
> 
> DATA CORRECTION
> 
> The "whiteSpace" facet and "default" value attribute should 
> expressively be categorised as "data correction" definitions 
> and not data validation definitions, making it clear that 
> such data corrections are to be made AFTER validation. 
> Otherwise a number of built-in simple types make no sense. Or 
> especially the enumerated valid values of the whiteSpace 
> facet make no sense (ex. collapse and replace).
> 
The whiteSpace facet is in fact now categorized as "pre-lexical" - it
happens before validation, not after.

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay
Received on Thursday, 14 May 2009 11:00:56 UTC