Potential new issue: PSVI considered harmful

[Introductory note: I am not a W3C XML Schema expert, a PSVI expert, or 
an XQuery expert, so it's perfectly possible that I'm way off-base here. 
  My feelings won't be hurt in the slightest if someone points this out.]

The notion of a PSVI (Post-Schema Validation Infoset) has arisen out of 
the W3C XML schema work, and is finding use in XPath2 and XQuery.  The 
PSVI is distinguished from the normal XML infoset as follows:

  - the addition of default element/attribute values provided in the schema
  - addition of type information declared in the schema, i.e. you can 
tell that the content of this attribute is supposed to be a date, and of 
that element to be a floating-point number in the range -1.0..+1.0.

Clearly it is obviously helpful if not essential to have type 
information around to support query operations.  Given this, there is 
clearly scope for a standard way to annotate both an XML instance and 
its accompanying infoset with its type information [the instance because 
instances are interchangeable and interoperable, infosets aren't].

The problem is that we are making the old SGML error all over again.  An 
SGML document can't be parsed at all without reading the schema (DTD), 
and the DTD conflated primitive typing, parsing support, entities, 
default values, and other stuff in a really messy way.

There is nothing wrong whatsoever in annotating XML with type 
information, but the PSVI suffers from the following flaws:

1. the inclusion of default values.  These are sufficiently problematic 
that the IETF is about to recommend they not be used at all, and I for 
one think there is a good case that they should be deprecated for 
architectural reasons.

2. the notion that annotation is necessarily linked to validation.  The 
problem here is with the "PSV" part of the name: there's nothing wrong 
with a Type-Augmented Infoset (TAI), but why link it to validation?

It may be the case that the TAI is based on the simple data types from 
XSD (although I would argue for cutting back to a more tractable and 
less bloated subset), but the connection to schema if any should not 
have anything to do with schema *processing*.

So I recommend a TAG finding along the following lines:

1. Type-augmented XML is a good thing and a recommendation should be 
prepared describing it both at the infoset and syntax level. (I gather 
there is already some work along these lines in XML Schema?).  Serious 
consideration should be given to 80/20 points rather than simply 
re-using the plethora of primitive types from XML Schema.
2. Type-augmented XML has nothing to say about default values created in 
any schema.
3. Any software can create and/or use type-augmented XML, whether or not 
any validation is being performed.
4. Work on XQuery and other things that require a Type-Augmented Infoset 
must not depend on schema processing, and should not have normative 
linkages to any schema language specifications.

  -Tim

Received on Wednesday, 12 June 2002 14:11:20 UTC