Re: Validation, Versioning and RDF [Was Re: WSDL WG request for adding multiple version extensibility into Schema 1.1] from David Booth on 2004-03-03 (www-ws-desc@w3.org from March 2004)

From: David Booth <dbooth@w3.org>
Date: Wed, 03 Mar 2004 11:33:36 -0500
To: Bijan Parsia <bparsia@isr.umd.edu>
Cc: <www-ws-desc@w3.org>, "David Orchard" <dorchard@bea.com>, "'Mark Baker'" <distobj@acm.org>
Message-Id: <5.1.0.14.2.20040301162536.034f28e8@localhost>
At 10:26 PM 2/27/2004 -0500, Bijan Parsia wrote:
>On Feb 27, 2004, at 5:34 PM, David Booth wrote:
>
>>Bijan,
>>
>>While I agree with your observation that RDF does not help you do data 
>>validation, to some extent I think it misses the point.
>
>Well, not in so far as David Orchard has a validation requirement. We do 
>need to keep the context of the discussion :)
>
>Without an XML Schema (or similarly inexpressive schema language, i.e., 
>without the ignore unknowns) validation requirement, then there is no 
>problem. Well formed XML can be as ignore-unknowns-versioning friendly as 
>RDF (well, maybe not *quite*, but largely good enough).

Yes, it *can* be, but in practice it is generally easier to accommodate 
change in relational data models than in tree-structured models such as 
XML, because trees are much more brittle.  IMO, this is the main reason 
relational databases won the database wars 20 years ago when they were 
introduced: you can easily add new tables (i.e., new relations) to a 
relational database without breaking existing application code.  However, 
if you add new data to an XML schema for a tree-structured data format (for 
example by inserting siblings or by inserting new levels of hierarchy) you 
are much more likely to break existing application code.  It is possible to 
avoid, but it's harder than in the relational world.

>I didn't see Mark Baker as arguing that David's requirements were bad, 
>just that RDF (and OWL) met them by design.
>
>Ok, so now you are, technically, arguing against David's requirement.

Correct.  I am saying that although I think validation is useful, I think 
its role is sometimes overemphasized, and if you relax the validation 
requirement and instead focus on the bigger picture of (A) ensuring that 
the parties agree on what data to expect and (B) can process the data 
properly without breaking in the face of ignorable extensions, then I think 
RDF *can* help meet part B.  (I previously assumed that OWL could also 
address part A, but as noted below, Bijan has since corrected me on this 
point.)

>>IMO, the most important use of XML schemas in WS is not to facilitate 
>>runtime message validation by an XML parser.
>
>Let's grant that.
>
>>   It is to clearly indicate the intended message format, so that 
>> requester and provider entities can be sure that their corresponding 
>> software agents will know:
>>         (a) how to generate the right data format;
>>         (b) how to access the data that they need (i.e., where
>>             in the parse tree); and
>>         (c) what datatype it is.
>
>None of this is achievable (at least straightforwardly) with RDF and/or OWL.

Wow, I stand corrected.  I just assumed that you could do this with OWL, 
but I haven't studied the spec, so I guess I should!  (Note to reader: 
Bijan, DaveO and I had a lengthy hallway discussion about this today at the 
Tech Plenary, in which Bijan explained this limitation of OWL quite 
convincingly.  Thanks Bijan!)

>>. . .
>>The reason the "ignore unknown" rule for optional extensions works so 
>>well for so many languages is that they have implicitly been making two 
>>very important assumptions: (a) that a statement in a given language 
>>represents a set of assertions; and (b) that the language makes the "open 
>>world" assumption.
>
>I don't believe so.
>
>>  (The "open world assumption" means that a statement in the language is 
>> not assumed to tell you *everything* that could possibly be true about 
>> its subject.  There may be additional things true about its subject (for 
>> example, additional behavior) beyond what the statement asserts.)
>
>I'm pretty sure I know what the open world assumption is :) But that ain't 
>it. Well, it is sortof, but it's not the canonical way of putting it :)
>It's not usually put out about *statements* but about knowledge or 
>databases. You make the closed world assumption when you assume that your 
>KB is *complete*, i.e., has all the positive information about your entities.

Yes, I realize that.  I was trying to state it in a way that would be more 
directly applicable to this context (i.e., viewing statements in a language 
as representing sets of assertions).

>Ok, we get some equivelence if we assume that by "statment" you meant "the 
>conjunction of the deductive closure of all the assertions about the 
>subject in the kb". Hmm. that misses a few things as well, but you get the 
>drift.

Yes, that's what I meant.

. . .
>  Perhaps you are right about ignore unknowns, but there has to be more to 
> the story.
>
>Actually, I don't believe you are right. I don't think they are assertions 
>and i don't believe there's an OWA. Your argument seems metaphorical.

Well, you're right that my argument is metaphorical in the sense that I'm 
looking back historically at extensibility in languages, and trying to draw 
conclusions about why the "MUST IGNORE" and "MUST UNDERSTAND" mechanisms 
work so well.  (See the WebArch document's advice on these at 
http://www.w3.org/TR/2003/WD-webarch-20031209/#pr-allow-exts .)  If you 
have a different explanation that you think offers more insight, I'd be 
interested in hearing it.


-- 
David Booth
W3C Fellow / Hewlett-Packard
Telephone: +1.617.253.1273
Received on Wednesday, 3 March 2004 11:34:44 UTC