Re: [Bug 3219] Choice of XML 1.0 vs XML 1.1 datatypes

(Hmm.  I'm in the position of having to respond to this while 
disconnected, meaning I can't get to the Bugzilla web interface.  A "reply 
to all" seems to include a Bugzilla address.  Let's see if that does 
anything useful.  If not, we'll have to copy this response into Bugzilla 
manually.  It really is a lot easier if we use tools that work 
disconnected.  Anyway...)

Mike Kay wrote:

> The choice of whether datatypes dependent on XML use the 1.0 or 1.1
> definitions is stated in the Status section to be 
implementation-defined,
> but section 1.3 says it should be under user-control (by means external 
to
> the schema itself). Firstly, this is contradictory.
> 
> In my view it would be better if this were defined within the
> schema, perhaps using a new facet. This would ensure that different 
software
> processors making use of the same schema did not apply different
> interpretations, and therefore (for example) that a document considered
> valid by its sender would not be considered invalid by its recipient. 
It's the
> job of the schema to say exactly which strings are valid for a given 
type and
> which aren't. 

Well, the only way to make use of a facet would be to derive new types 
using that facet.  That takes us down the path of having something like 
xsd:string(legacy 1.0 type) and xsd:string11, presumably both descended 
from some magical new base type.  Or, perhaps, it means having 
xsd:string11 as a new ancestor of xsd:string.  I think that having types 
like xsd:string11 will be the death of XML 1.1, that is presuming it's 
otherwise in good shape.  I don't think users will want to explicitly deal 
with the differently named types, I think there would be an unfortunate 
growth in function signatures for F&O, etc.

One of the reasons to have the designation outside of the schema is for 
the benefit of the many systems, like databases, that will deal with more 
than one document.  If each document or schema can choose its own type 
system, then a container like a database must deal with both types (e.g. 
of string), which is a complication.  This design lets a database make a 
consistent choice, if it wishes to.  First of all, if the database is 
already deployed with the 1.0 types, it can continue to use those even 
with schemas that use other schema 1.1 features like weak wildcards and 
co-constraints.  If the schema could dictate use of 1.1 types, then that 
would be a big hurdle for bringing schema 1.1 into certain database 
systems.  Conversely, a newer database system can universally adopt the 
1.1 types, which in general will accept all 1.0 content, just failing to 
throw validity errors in certain cases where an instance uses 1.1 names or 
strings.  Granting that such acceptance could conceivably be an integrity 
or security issue for some applications, we grant that database the 
lattitude to validate with 1.0 rules should the application wish it to.

I think that arrival of XML 1.1 put us in a no win position, since a clean 
system would have only one "type" for abstractions like string and 
xsd:name, and somewhere somehow we now need two.  I think the compromise 
we've chosen is a reasonable one.

> The Note suggesting that the choice might be driven by the version 
> label on the
> input document is inappropriate for some QT contexts, where there is no
> relevant input document, or where the XML declaration is no longer
> available.

Yes, but that's exactly why it's a note suggesting this policy as an 
option for the situations where it does apply, and not as a requirement or 
even a desideratum for the situations where it doesn't.

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Sunday, 21 May 2006 20:03:13 UTC