Re: Potential new issue: PSVI considered harmful from noah_mendelsohn@us.ibm.com on 2002-06-18 (www-tag@w3.org from June 2002)

From: <noah_mendelsohn@us.ibm.com>
Date: Tue, 18 Jun 2002 16:46:16 -0400
To: www-tag@w3.org
Message-ID: <OFF873DB77.C9D1EA91-ON85256BDC.00715FBD@lotus.com>
Rick Jeliffe wrote:

>> I agree with Tim Bray (and perhaps Noah Mendelsohn,
>> in part) that the PSVI should be renamed.
>>
>> * First because it may not be PSV, as he says.
>> 
>> * Second because it does not have a
>>   relationship-preserving re-serialization to XML
>>   (except of course by stripping out the augmentations
>>   and requiring validation again) and therefore is
>>   non-XML. PSVI does not draw out this discontinuity
>>   enough.
>> 
>> * Third because "Schema" is a codeword for W3C XML
>>   Schemas, but other schema languages could be used.

I am actually proposing an additional intermediate
layer of Infoset.  Today we have XML Infoset, which is
a proper subset of the PSVI resulting from W3C Schema
validation.  What I am speculating is that a stack
along the following lines would make sense:

* XML Infoset (as we know it today)

* Type-name-aware infoset: (details TBD) This would add
  type name information that could come the instance
  (e.g xsi:type) and/or validation with any schema language
  that used a consistent naming convention for types
  (presumably QNames).  We'd have to think about
  whether type names should be specifieable for
  attributes or only for elements.  xsi:type might have
  to be moved out of the Schema rec and into its own
  little rec, feeding the TAI without requiring
  validatin.

* W3C XML Schema PSVI: As it exists in the Schema
  recommendation.  This is what you use when your
  application really does want to know about defaults,
  reflected type definitions, what validated, and 
  other data that is by its nature determined during 
  validation.  As I said, the whole point of validation
  is to learn information about the combination
  of a document and a schema.  Formalizing that is 
  a good thing (depending on that information when
  you shouldn't is what's potentially bad.)

So, the TAI would take one or a few of the properties
that are now only available in the PSVI, and make clear
that their values can also sometimes be determined
without doing W3C XML schema validation. Some care
would be needed to avoid inconsistencies when
validation is in fact done using a TAI as input.  I
suspect that's tractable, perhaps in a Schema 1.1 Rec.

Tim Bray wrote:

>> ... and if such a thing existed, presumably it's what
>> XQuery/XPath ought to use, as opposed to its current
>> contra-factual assertions that the only way to get
>> types to use in queries is to apply a particular flavor
>> of schema processor.

Regardless of the layering, I think Query and XPath
should give as much value as possible when used with
well-formed documents.  That means having a good story
based only on the XML Infoset, or maybe on the TAI if
that proves to be a good idea.  Both can be determined
only be inspecting the instance in isolation (well, in
isolation of XML schemas -- you may still need external
DTD, Entities, etc. if standalone=no).

Given that starting point, I think the query group
should ask: is there an even more valuable service we
should provide in the case where validation is indeed
to be performed?  I'm neutral on that.  If everything
applications need can be done just knowing the names of
types, then Query should depend only on TAI.  If there
is a more valueable service that can be performed using
the additional information that's only available after
validation, then that should be considered on the
merits.  There still should be the option to use
query with just infoset/TAI, I.e. without doing
validation.

BTW:  these are just my thoughts, don't represent IBM's
corporate position, and haven't been coordinated with
our XML Query team.  On query matters, they can give
you official positions.

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
Received on Tuesday, 18 June 2002 17:04:28 UTC