Re: Validation, Versioning and RDF [Was Re: WSDL WG request for adding multiple version extensibility into Schema 1.1] from Bijan Parsia on 2004-02-28 (www-ws-desc@w3.org from February 2004)

From: Bijan Parsia <bparsia@isr.umd.edu>
Date: Fri, 27 Feb 2004 22:26:17 -0500
To: "David Booth" <dbooth@w3.org>
Cc: <www-ws-desc@w3.org>, "David Orchard" <dorchard@bea.com>, "'Mark Baker'" <distobj@acm.org>
Message-Id: <DEABB3FF-699D-11D8-9AD6-0003939E0B44@isr.umd.edu>
On Feb 27, 2004, at 5:34 PM, David Booth wrote:

> Bijan,
>
> While I agree with your observation that RDF does not help you do data 
> validation, to some extent I think it misses the point.

Well, not in so far as David Orchard has a validation requirement. We 
do need to keep the context of the discussion :)

Without an XML Schema (or similarly inexpressive schema language, i.e., 
without the ignore unknowns) validation requirement, then there is no 
problem. Well formed XML can be as ignore-unknowns-versioning friendly 
as RDF (well, maybe not *quite*, but largely good enough). I didn't see 
Mark Baker as arguing that David's requirements were bad, just that RDF 
(and OWL) met them by design.

Ok, so now you are, technically, arguing against David's requirement.

> IMO, the most important use of XML schemas in WS is not to facilitate 
> runtime message validation by an XML parser.

Let's grant that.

>   It is to clearly indicate the intended message format, so that 
> requester and provider entities can be sure that their corresponding 
> software agents will know:
>         (a) how to generate the right data format;
>         (b) how to access the data that they need (i.e., where
>             in the parse tree); and
>         (c) what datatype it is.

None of this is achievable (at least straightforwardly) with RDF and/or 
OWL. Or rather, I suspect for each of these that it's either trival or 
impossible, depending on what you mean. I'm working on some ways to do 
some of this, but it's not easy. There was a thread on public-sws-ig 
that discussed what I call the Decker problems:
	http://lists.w3.org/Archives/Public/public-sws-ig/2004Feb/0037.html

I meant to post another message specifically focusing on that, but have 
been swamped. I intend to do a presentation on this the week after the 
tech plenary and to write a paper on it for ISWC. If people would like 
to discuss it at the Tech Plenary, I'd be happy to.

Taking them in order:
	(a) if by "data format" you mean "serialization", then you have 
Trival: always RDF/XML and Impossible: putting all and exactly the 
triples you want (especially if you want ground triples). Ok, it's not 
*impossible* in some sense, but you aren't going to get it, afaict, by 
saying "instances of some rdfs or owl class"
	(b) I'm not sure this is meaningful for RDF. perhaps something about 
the sorts of query one must make.
	(c) OWL classes aren't datatypes, in the standard sense. Hence much of 
the difficulty. They *don't* tell you about the structure of a record 
or set of bits. They don't tell you about the valid operations on 
values of that type. They are neither concrete nor abstract datatypes 
in the standard programming language sense. A Person class isn't 
naturally thought of as a set of objects or records describing persons, 
but as a set of *persons*.

> IMO, runtime message validation by an XML parser is a distant 
> secondary consideration.  Why?  Because nearly every responsible 
> application MUST do its own application-specific data validation 
> anyway, before it makes use of the incoming data.  An XML schema in 
> conjunction with a validating parser can somewhat reduce the data 
> validation burden, but in general cannot eliminate it.   Furthermore, 
> there is a risk to doing part of the validation in the XML parser: The 
> code that performs the app-specific data validation is separated from 
> the XML Schema, and this means that they can get out of sync if the 
> schema is later modified.  Hence, it is safest if the code does *not* 
> rely on the XML parser to validate.  For this and performance reasons, 
> many production XML applications turn off parser validation.

Most of this is moot to me as it's not my requirement. I'd be 
interested in David Orchard's response.

> Returning to the list above, an XML schema is very good for indicating 
> the location of data items in the parse tree, along with their data 
> types.   But if RDF is used, then the format of the data is largely a 
> non-issue, and RDF can also indicate the data types.

Actually, for some value of format it *is* an issue. And RDF can't 
indicate the *data* type of abstract individuals (in the normal case).

If you examine the public-sws-ig posts, you'll see there's a duel 
problem: Possibly too *many* triples, and possibly too *few* (of the 
right sort).

If you say that you consume a Person (and, btw, think about how *odd* 
that reads...there are other issues!) and a Person has a Parent who is 
a person...how much of the ancestor tree should you send? What about 
all the *other* properties that your specific Person might have? 
There's no principled way to draw the line using OWL or RDFS class 
definitions.

On the flip side, suppose you consume a Parent (because, say, you are 
evaluating child benefits). You know, by the class definition, that a 
Parent is any Person with at least one Child. I have a kb which claims 
that Maria is a Parent but *says nothing else about Maria*. Oops! this 
is consistent! I can *infer* that Maria has at least one child, but I 
have no idea who that is.

Actually, suppose I *have* all her children properly related in my kb, 
but my application doesn't care about the children, only the fact of 
parenthood. How do I know *not* to send all the child information?

I don't. I can't, actually.

So, even barring validation, there is a problem. It's a different 
problem :) But it's still a problem. It's a problem related to 
validation *in that* information suitable for data validation also is 
suffcient to express certain kinds of data constraints and contracts. 
This is a *client* NOT a service side issue.

> More comments in line . . .
>
>> At 11:59 AM 2/15/2004 -0500, Bijan Parsia wrote:
>>>
>>> On Feb 15, 2004, at 1:31 AM, David Orchard wrote:
>>>
>>> . . .
>>> Well, the reason that I want "ignore unknowns" is because I know that
>>> "ignore unknowns" has been deployed on the web for >10 years and it 
>>> works
>>> for versioning.  If there's another solution, I'm really really 
>>> really
>>> interested in it.
>>
>> The extra bit, perhaps, is the validation. Although required known 
>> fields are ubiquitous where you have ignore unknowns :)
>
> +1
>
> The reason the "ignore unknown" rule for optional extensions works so 
> well for so many languages is that they have implicitly been making 
> two very important assumptions: (a) that a statement in a given 
> language represents a set of assertions; and (b) that the language 
> makes the "open world" assumption.

I don't believe so.

>  (The "open world assumption" means that a statement in the language 
> is not assumed to tell you *everything* that could possibly be true 
> about its subject.  There may be additional things true about its 
> subject (for example, additional behavior) beyond what the statement 
> asserts.)

I'm pretty sure I know what the open world assumption is :) But that 
ain't it. Well, it is sortof, but it's not the canonical way of putting 
it :) It's not usually put out about *statements* but about knowledge 
or databases. You make the closed world assumption when you assume that 
your KB is *complete*, i.e., has all the positive information about 
your entities. Ok, we get some equivelence if we assume that by 
"statment" you meant "the conjunction of the deductive closure of all 
the assertions about the subject in the kb". Hmm. that misses a few 
things as well, but you get the drift.

>  Of course, sets of assertions in a open world are what RDF is all 
> about, so this comes naturally to RDF.

Eh. It's perfectly pointless to talk about RDF as RDF is so 
inexpressive that there are almost no constraints. saying that 
something is rdf:type FOO tells you almost nothing at all!

Even RDFS is very weak.

OWL is pretty strong and makes the open world assumption, but so? For 
knowing exactly what to put in a message, the OWA is a liabilty. At 
least, the unrestricted OWA. Perhaps you are right about ignore 
unknowns, but there has to be more to the story.

Actually, I don't believe you are right. I don't think they are 
assertions and i don't believe there's an OWA. Your argument seems 
metaphorical.

>>>  . . .
>>> So, what are the requirements:
>>> 1. Types that are valid have type information
>>> 2. Types that are not known do not break validation
>>> 3. Types allow for arbitrary extensibilty in ways not predicted by 
>>> the
>>> Version N schema author.
>>> 4. Types that are not known and optional can be added without 
>>> breaking
>>> compatibility (same as #2?)
>>> 5. Types that are known and not allowed break validation.
>>>
>>> Assuming that these are roughly the requirements for doing 
>>> compatibile
>>> versioning, Bijan, what would the RDF/XML look like to express these
>>> assurances?
>>
>> Can't. Not even with OWL.  . . . .
>
> But if you relax the validation requirement, and instead focus on the 
> twin tasks of (1) ensuring that the parties agree on what data to 
> expect,

I hope my examples above show that this doesn't work (at least with OWL 
class expressions)

>  and (2) *processing* the data; then I think RDF *does* meet these 
> requirements very well.

Nope. Or rather, I don't quite see this part.

> In other words, if we re-write the above requirements as:
>   1. Data items that are expected have type information

What do you mean by type information?

>   2. Data items that are not expected do not break processing
>   3. Input data allows for arbitrary extensibilty in ways
>      not predicted by the Version N schema author.
>   4. Data items that are not known and optional can be added
>      without breaking compatibility (same as #2?)
>   5. (N/A if validation requirement is relaxed)
>
> and we can add a further requirement that may have been implicit 
> earlier:
>   6. A machine-processable document can clearly indicate what
>      data items (and types) are expected

This is what I believe isn't the case.

> then I think RDF meets the requirements nicely.

Sorry, really no.

>> The hard bit is what "valid" means. . . . .
>
> +1
>
> In short, the ability to have your XML parser validate incoming 
> messages against a schema is nice, and can be useful.  But it is only 
> a part of the story, and IMO it is a much less important part than 
> other considerations.

Fine. The other considerations still fail. It's a brutal truth, but 
there you go. RDF and OWL are designed for different things. Or more 
precisely, RDF(S) and OWL *classes* are designed for different things. 
It's far from clear to me that they *cannot* be useful for this sort of 
thing, but it's an open question as to exactly *how*. Especially given 
a message passing architecture.

Cheers,
Bijan Parsia.
Received on Friday, 27 February 2004 22:26:19 UTC