- From: Tim Bray <tbray@textuality.com>
- Date: Wed, 12 Jun 2002 11:13:37 -0700
- To: www-tag@w3.org
[Introductory note: I am not a W3C XML Schema expert, a PSVI expert, or an XQuery expert, so it's perfectly possible that I'm way off-base here. My feelings won't be hurt in the slightest if someone points this out.] The notion of a PSVI (Post-Schema Validation Infoset) has arisen out of the W3C XML schema work, and is finding use in XPath2 and XQuery. The PSVI is distinguished from the normal XML infoset as follows: - the addition of default element/attribute values provided in the schema - addition of type information declared in the schema, i.e. you can tell that the content of this attribute is supposed to be a date, and of that element to be a floating-point number in the range -1.0..+1.0. Clearly it is obviously helpful if not essential to have type information around to support query operations. Given this, there is clearly scope for a standard way to annotate both an XML instance and its accompanying infoset with its type information [the instance because instances are interchangeable and interoperable, infosets aren't]. The problem is that we are making the old SGML error all over again. An SGML document can't be parsed at all without reading the schema (DTD), and the DTD conflated primitive typing, parsing support, entities, default values, and other stuff in a really messy way. There is nothing wrong whatsoever in annotating XML with type information, but the PSVI suffers from the following flaws: 1. the inclusion of default values. These are sufficiently problematic that the IETF is about to recommend they not be used at all, and I for one think there is a good case that they should be deprecated for architectural reasons. 2. the notion that annotation is necessarily linked to validation. The problem here is with the "PSV" part of the name: there's nothing wrong with a Type-Augmented Infoset (TAI), but why link it to validation? It may be the case that the TAI is based on the simple data types from XSD (although I would argue for cutting back to a more tractable and less bloated subset), but the connection to schema if any should not have anything to do with schema *processing*. So I recommend a TAG finding along the following lines: 1. Type-augmented XML is a good thing and a recommendation should be prepared describing it both at the infoset and syntax level. (I gather there is already some work along these lines in XML Schema?). Serious consideration should be given to 80/20 points rather than simply re-using the plethora of primitive types from XML Schema. 2. Type-augmented XML has nothing to say about default values created in any schema. 3. Any software can create and/or use type-augmented XML, whether or not any validation is being performed. 4. Work on XQuery and other things that require a Type-Augmented Infoset must not depend on schema processing, and should not have normative linkages to any schema language specifications. -Tim
Received on Wednesday, 12 June 2002 14:11:20 UTC