Re: RDF Issue rdfs-clarify-subClass-and-instance from Frank Manola on 2002-08-29 (www-rdf-comments@w3.org from July to September 2002)

From: Frank Manola <fmanola@mitre.org>
Date: Thu, 29 Aug 2002 08:44:27 -0400
To: graham wideman <graham@wideman-one.com>
CC: Brian McBride <bwm@hplb.hpl.hp.com>, www-rdf-comments@w3.org, phayes@ai.uwf.edu
Message-ID: <3D6E172B.10207@mitre.org>
Graham--

Thanks for the comments.  Responses below.

graham wideman wrote:

> Frank:
> 
> Thanks for your detailed comments. You've at least not alerted me to something that I had completely missed!   Now to the two issues under consideration:
> 


I'm glad to hear that!


> 1. The "Property implies class" Problem
> ----------------------------------------
> Much of what you (and various RDF docs) say about the impact of rdfe:domain is along the lines of:
> 
> 
>>we're not *forcing* resources to become instances of classes by making
>>rdfs:domain declarations;
>>
> 
> ... suggesting that my private app could regard the rdfs:domain statements any way it likes, in particular that it can employ the rdfs:domain and related features to support the constraints that I outline.  
> 
> But your assertion ("not forcing") seems incompatible with what the docs say. The latest Primer, I now see, is in line with the Model Theory (2002-04-29) doc section 4.2, which provides rule rdfs2:
> 
> if E contains:
>   xxx aaa yyy
>   aaa [rdfs:domain] zzz
> 
> then add:
>   xxx [rdfs:type] zzz


Certainly one of the points of the Primer rewrite was to have the words 
be more clearly consistent with the Model Theory.


> 
> To me this says that if I claim that my app is using ("conforming to") RDFS, then it has to abide by this entailment. So, when I get to your example:
>


Not quite.  It might DO the entailment, but it doesn't have to "abide 
by" it.  More on this below.

 
> 
>>ex:author rdfs:domain ex:Book
>>
>>ex:Garfinkle rdf:type ex:Cat
>>ex:Garfinkle ex:author "Fred Smith"
>>
>>[... must the app conclude the following?...]
>>ex:Garfinkle rdf:type ex:Book
>>
> 
>>your processing application could say a number of things:
>>
>>1.  The instance data must be wrong:  Garfinkle must be a Book rather
>>than a Cat
>>2.  The instance data must be wrong:  we'll say that it's invalid
>>3.  The schema must be wrong:  Cats can clearly have authors too
>>4.  Everyone's right:  Garfinkle is both a Book and a Cat
>>
> 
> ... I believe that the RDFS entailment above requires my app to decide 4, "Garfinkle is both a Book and a Cat" if it wants to claim compliance with RDFS. 
> 
> If you are saying that my supposedly RDFS-complying app is free *not* to abide by the entailment above, then what is the distinction between RDFS rules that are required, and RDFS rules that are not?  Perhaps my app could freely ignore subClassOf and rdf:type as well and still be an OK RDFS app?
> 


Pat may want to put this differently (or even contradict it), and maybe 
I didn't put this well to start with, but here's my take on this:  The 
closure rules defined in the Model Theory (one of which you've quoted 
above) effectively say "here's the additional information that you can 
infer based on (a) the information in the schema triples and (b) the 
instance data (RDF graph) you're considering."   The result of doing 
these inferences is an expanded graph that effectively contains all the 
information you then have available.  However, those rules don't require 
that your processor "abide by" those inferences in preference to other 
information in the resulting graph.  Graphs can contain information that 
appears odd (or just plain wrong) to an application.  In this case, the 
rules say that, based on the combination of schema information and 
instance data, you (at least conceptually) have

[schema] ex:author rdfs:domain ex:Book

[instance data] ex:Garfinkle rdf:type ex:Cat
[instance data] ex:Garfinkle ex:author "Fred Smith"

[inference] ex:Garfinkle rdf:type ex:Book

But the rules don't say what you must do with this information.  In 
particular, the rules don't say your application must ignore that its 
intention is to treat the schema information as a constraint, and go 
with the inference instead.  All the rules say is, in effect, "this is 
what the combination of the schema and your instance data imply."  The 
situation here, as far as the application is concerned, is that it's got 
what it considers an anomaly:  that Garfinkle is both a Cat and a Book. 
  We can't declare that Cats and Books are disjoint in RDFS (we don't 
have the language for it), but the application, written to enforce that 
constraint, can look at this information and raise an error.

A lot of the schema constraint checking capability that you (I think) 
have in mind involves a processor making a lot of additional assumptions 
that, in processing data from potentially multiple sources on the Web, 
we don't want to build into the definition of RDFS.  For example, lots 
of type checking systems would say that if you define a particular 
property as applying to a given type (like authors to Books), a Book 
would be an illegal instance if it appeared in instance data without an 
author property.  But that's an *additional* assumption, not one that 
necessarily follows from saying that authors applies to Books. 
Similarly, lots of type checking systems would say that if you define a 
type with a specific set of properties, if an instance appears with a 
property not defined in that set, it's an illegal instance.  But that 
involves the additional assumption that you've specified (and that the 
processor has found on the Web) *only* the properties that you intend to 
apply to that type.  Many of these sorts of assumptions assume more of a 
closed world than we can afford to assume in RDFS (although a particular 
application may well enforce those constraints).  What we've done in 
RDFS is define a fairly minimal facility for indicating what properties 
people intend to use for what types, leaving it up to applications to 
decide how they want to use that information (and enforcing the 
consequences of those intentions).

However, really the basis of all constraint checking is first 
determining that there's an anomaly:  the schema says this, and the 
instance data says that, and there's an apparent conflict.  What we're 
doing is leaving the conflict resolution to the application, rather than 
building it in (partly because we don't always have a way in RDFS to 
even state what constitutes a conflict, e.g., that Cats can't be Books).


> 2. The Multi-Class Domain Problem
> ----------------------------------
> Thanks, you provided a nice explanation of how this arises as a result of:
> 
> a) The above "Property implies Class" semantics, and
> b) Multi rdfs:domain statements are individual assertions which must all be individually satisfied.
> 
> I continue to regard the result as a fatal flaw, but it now strikes me as a secondary problem. 
> 
> FWIW, if the only problem were the need to be able to spec that a Property can apply to instances of *any* of a list of classes rather than *all* of a list of classes, surely RDF has available syntax that the Schema spec could employ for this? (Maybe ALT fits in here... I haven't thought it through other than to hope that RDF is capable enough that this simple matter would be trivial to cast in RDF(S)...)
> 
> 3. "Property implies class": Revisited
> --------------------------------------
> 
> In the real world we often (usually?) classify objects based on their "properties":
> 
> 4-legs, furry, barks --> class = dog
> 4-legs, furry, meows --> class = cat
> flippers, furry, barks --> class = seal 
> 
> It is a *special case* where we can determine class membership based on only a single property. 
> 
> Hence, IMO, RDFS's prescription that each single property determines class membership independent of other properties is significantly at odds with the real world objects that RDF is designed to talk about. 
> 
> I'm increasingly convinced that this leads to several consequences (already noted) that prevent use of rdfs:domain and related features as the basis of any useful functionality at all.
> 
> Here's what I'd need to convince me otherwise -- and I'm hoping that somebody can point me to docs where these were already thought through in the process of devising the rdfs:domain etc features:
> 
> a) rdfs:domain Applied Usefully: An example where rdfs:domain does record the specs necessary to support *any* useful non-trivial constraining of properties to instances of particular classes. 
> 
> b) An Important Example Implemented: An example where constraints for a couple of tables (in the database sense), are specified by rdfs:domain statements. Particularly where the tables have a field/property in common. 
> 
> Such use cases would really be proof of the pudding... or proof that there's at least some pudding!
> 


I'm not sure what you're after here (particularly about (b)).  There are 
a number of existing applications that use (I assume "usefully") RDFS to 
define their structures, and I imagine they use the RDFS specifications, 
at least to some extent, to specify constraints just as you wish.  The 
point isn't that you *can't* use RDFS schemas in that way, it's just 
that we want the applications to make the decisions about how (and to 
what extent) to resolve these issues, and I assume they do in a way that 
suits their intended purposes.  Certainly it's possible (and not a 
"violation" of RDFS) to build an application that would use the example 
information we've been discussing to conclude that what someone's said 
about Garfinkle having an author must be wrong.

Is this helping?


--Frank



-- 
Frank Manola                   The MITRE Corporation
202 Burlington Road, MS A345   Bedford, MA 01730-1420
mailto:fmanola@mitre.org       voice: 781-271-8147   FAX: 781-271-875
Received on Thursday, 29 August 2002 08:34:00 UTC