Validation, Versioning and RDF [Was Re: WSDL WG request for adding multiple version extensibility into Schema 1.1] from David Booth on 2004-02-27 (www-ws-desc@w3.org from February 2004)

From: David Booth <dbooth@w3.org>
Date: Fri, 27 Feb 2004 17:34:22 -0500
To: Bijan Parsia <bparsia@isr.umd.edu>, "David Orchard" <dorchard@bea.com>
Cc: <www-ws-desc@w3.org>, "'Mark Baker'" <distobj@acm.org>,
Message-Id: <5.1.0.14.2.20040227124257.035ec370@localhost>
Bijan,

While I agree with your observation that RDF does not help you do data 
validation, to some extent I think it misses the point.

IMO, the most important use of XML schemas in WS is not to facilitate 
runtime message validation by an XML parser.  It is to clearly indicate the 
intended message format, so that requester and provider entities can be 
sure that their corresponding software agents will know:
         (a) how to generate the right data format;
         (b) how to access the data that they need (i.e., where
             in the parse tree); and
         (c) what datatype it is.

IMO, runtime message validation by an XML parser is a distant secondary 
consideration.  Why?  Because nearly every responsible application MUST do 
its own application-specific data validation anyway, before it makes use of 
the incoming data.  An XML schema in conjunction with a validating parser 
can somewhat reduce the data validation burden, but in general cannot 
eliminate it.   Furthermore, there is a risk to doing part of the 
validation in the XML parser: The code that performs the app-specific data 
validation is separated from the XML Schema, and this means that they can 
get out of sync if the schema is later modified.  Hence, it is safest if 
the code does *not* rely on the XML parser to validate.  For this and 
performance reasons, many production XML applications turn off parser 
validation.

Returning to the list above, an XML schema is very good for indicating the 
location of data items in the parse tree, along with their data 
types.   But if RDF is used, then the format of the data is largely a 
non-issue, and RDF can also indicate the data types.

More comments in line . . .

>At 11:59 AM 2/15/2004 -0500, Bijan Parsia wrote:
>>
>>On Feb 15, 2004, at 1:31 AM, David Orchard wrote:
>>
>>. . .
>>Well, the reason that I want "ignore unknowns" is because I know that
>>"ignore unknowns" has been deployed on the web for >10 years and it works
>>for versioning.  If there's another solution, I'm really really really
>>interested in it.
>
>The extra bit, perhaps, is the validation. Although required known fields 
>are ubiquitous where you have ignore unknowns :)

+1

The reason the "ignore unknown" rule for optional extensions works so well 
for so many languages is that they have implicitly been making two very 
important assumptions: (a) that a statement in a given language represents 
a set of assertions; and (b) that the language makes the "open world" 
assumption.  (The "open world assumption" means that a statement in the 
language is not assumed to tell you *everything* that could possibly be 
true about its subject.  There may be additional things true about its 
subject (for example, additional behavior) beyond what the statement 
asserts.)  Of course, sets of assertions in a open world are what RDF is 
all about, so this comes naturally to RDF.

>>  . . .
>>So, what are the requirements:
>>1. Types that are valid have type information
>>2. Types that are not known do not break validation
>>3. Types allow for arbitrary extensibilty in ways not predicted by the
>>Version N schema author.
>>4. Types that are not known and optional can be added without breaking
>>compatibility (same as #2?)
>>5. Types that are known and not allowed break validation.
>>
>>Assuming that these are roughly the requirements for doing compatibile
>>versioning, Bijan, what would the RDF/XML look like to express these
>>assurances?
>
>Can't. Not even with OWL.  . . . .

But if you relax the validation requirement, and instead focus on the twin 
tasks of (1) ensuring that the parties agree on what data to expect, and 
(2) *processing* the data; then I think RDF *does* meet these requirements 
very well.

In other words, if we re-write the above requirements as:
   1. Data items that are expected have type information
   2. Data items that are not expected do not break processing
   3. Input data allows for arbitrary extensibilty in ways
      not predicted by the Version N schema author.
   4. Data items that are not known and optional can be added
      without breaking compatibility (same as #2?)
   5. (N/A if validation requirement is relaxed)

and we can add a further requirement that may have been implicit earlier:
   6. A machine-processable document can clearly indicate what
      data items (and types) are expected

then I think RDF meets the requirements nicely.

>The hard bit is what "valid" means. . . . .

+1

In short, the ability to have your XML parser validate incoming messages 
against a schema is nice, and can be useful.  But it is only a part of the 
story, and IMO it is a much less important part than other considerations.


-- 
David Booth
W3C Fellow / Hewlett-Packard
Telephone: +1.617.253.1273
Received on Friday, 27 February 2004 17:34:26 UTC