RE: PSVI architectural discussion from Dare Obasanjo on 2002-06-22 (www-tag@w3.org from June 2002)

From: Dare Obasanjo <dareo@microsoft.com>
Date: Sat, 22 Jun 2002 13:06:40 -0700
To: "Tim Bray" <tbray@textuality.com>, <www-tag@w3.org>
Message-ID: <8BD7226E07DDFF49AF5EF4030ACE0B7E06621D35@red-msg-06.redmond.corp.microsoft.com>
-----Original Message----- 
From: Tim Bray [mailto:tbray@textuality.com] 
Sent: Fri 6/21/2002 5:31 PM 
To: www-tag@w3.org 
Cc: 
Subject: PSVI architectural discussion


>1. XML Schema Validation generates information
>
>Validation takes as input an XML instance and one or more XML Schema
>instances, and produces potentially a lot of output.  

>Currently, all of this stuff is lumped together and placed in the "PSVI".

Correct. The purpose of the PSVI is to act as a representation of all the events that occured during validation of an XML document. It captures all the information a validating processor knows about the document after validation. 

>2. Use of PSVI contents
>
>The XML Schema WG is currently engaged in investigating which pieces of
>the PSVI are of potential interest and assembling use cases.  Presumably
>if it emerges that there is wide interest in access to particular PSVI
>items, someone will have to take on the work of publishing an API and
>serialization for them.

This is news to me. I was part of a meeting with our standards reps just a few hours before your email was sent to the TAG and none of them brought this up. Similarly looking at the member only w3c-xml-schema-wg and w3c-xml-schema-ig lists does not point to any discussion either. Can you please point me to a written record of this discussion so I can properly comment on whatever has been decided? 

Secondly it is a well known fact that there is interest in particular PSVI items, specifically type information. 

Finally, I personally do not think it is the responsibility of the W3C to dictate one-size-fits-all Application Programming Interfaces that possibly cannot conform to the naming conventions, design guidelines, or programming idioms of even the more popular programming languages let alone for all of them. 

>3. The PSVI contents are heterogeneous

>The PSVI's contents have the sole defining characteristic that they are
>generated as a result of schema validation.  

Exactly. That is why it is called the Post Schema Validation Infoset. 


>4. Do the PSVI contents belong in the infoset?

 >On the other hand, it's not obvious that the infoset's framework of
>"items" and "properties" is a good way to describe things like
>validation outcomes and type information.  Let's assume we decide that
>some of this stuff needs to be made available to other parties - is it a
>useful or necessary step to go through the infoset to get there?  I'm
>not being rhetorical here, this is just not obvious to me.

MSXML 4.0 shipped with a document object model (DOM) and a schema object model (SOM).  A validated DOM node could retrieve its schema declaration (i.e. its corresponding SOM node) via the getDeclaration() method[0]. This functionality is appreciated by our users and we often get requests to add similar functionality to our System.Xml namespace in teh .NET framework. 

>5. The PSVI type information is itself heterogeneous
> 6. Type naming is tricky
> 7. Type information is useful outside of validation applications

I agree with your observations. 

> 8. Why not standardize on XML Schema's primitive data types?

Strong agreement from me here. 

>Question: are the specs well-enough modularized that it's easy to
>normatively reference in basic types by reference?

Considering that implementations of RELAX NG as well as the W3C XML Query effort both do this already I believe they are modular enough in that respect. 

>Question: For things that are this widely shareable, I think it's
>architecturally essential to have actual URIs, not just qnames; is this
>hard to achieve?

I haven't followed the QNames must be URIs discussion in the TAG but also would be interested in seeing a response to this. 

> 9. Type names and type semantics exist independent of schemas

>The XQuery processor that's accessing this database will know from some
>sort of data dictionary implementation that a <detail> element has
>unitPrice= and quantity= attributes, and the primitive data types of
>each attribute.  

And what will this data dictionary format be in? From looking at XML databases it will be DTDs and/or W3C XML Schema so I believe your point here is moot. 

>This doesn't in the slightest get in the way of, for example, XQuery semantics.

Being able to perform XQuery queries over a document without a schema works because XQuery will use W3C XML Schema data types as its primitives. I'm not sure whether this proves your point or not but then again I'm not sure what your point is. 

> 10. Coupling specs to PSVI as it exists today is architecturally unsound

>The PSVI is a grab-bag of stuff that's defined as being the outcome of a
>particular operation; any attempt to pretend that all its contents can
>be talked about, addressed, or used in a uniform way is just misguided.

Agreed. I find it hard to imagine any application that will need to utilize all aspects of the PSVI outside of validation episodes. 


>  Also it needs to be crystal-clear that you can have types without having a schema or doing validation.

You can have simple types but not complex types. Even then one still has to inspect the instance document to make sure the asserted type information is correct which is for all intents and purposes is validation. 


>Conclusion

>Where I'd like to end up is is:


Paul is our TAG rep and I'll let him respond to this himself as opposed to commenting myself. 

[0] http://msdn.microsoft.com/library/en-us/xmlsdk/htm/xml_mth_dg_4tke.asp
Received on Saturday, 22 June 2002 16:07:12 UTC