Re: RDF Issue rdfs-clarify-subClass-and-instance from Frank Manola on 2002-08-28 (www-rdf-comments@w3.org from July to September 2002)

From: Frank Manola <fmanola@mitre.org>
Date: Wed, 28 Aug 2002 16:52:49 -0400
To: graham wideman <graham@wideman-one.com>
CC: Brian McBride <bwm@hplb.hpl.hp.com>, www-rdf-comments@w3.org, phayes@ai.uwf.edu
Message-ID: <3D6D3821.EC72206C@mitre.org>
graham wideman wrote:
> 
> Frank (because this is about the Primer) and Folks:
> 
> In continuing my stint as test reader of RDF docs (though I'm becoming less a virgin reader by the hour :-) ... a success and a serious setback to report...
> 
> 1. Progress in Understanding, and Why
> -------------------------------------
> First, I feel that I'm making some solid progress firming up what RDFS is all about, certainly partly from the various emails that have been generated (thanks all!). In addition, particularly helpful was following the thread of messages following Frank's:
> 
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Aug/0173.html
> 
> ... which gave me some confidence that I can understand RDF as a system that can "stand alone" independent of RDFS.  More importantly it clarified that what RDFS adds is not just another namespace-worth of elements and their semantics, but also additional semantics for "already existing" RDF elements, notably rdf:type.
> 
> In particular, it made sense out of doc statements like "rdf:type requires a value which is an instance of rdfs:Class".  I'm now seeing that this probably means something more like: "rdf:type requires a value which in RDFS would be regarded as an instance of rdfs:Class".

Good.

> 
> 2. Showstopping Setback: rdfs:domain *can't* work like that, can it?
> --------------------------------------------------------------------

Can't work like what?  See below.

> In the process of firming up my understanding of RDFS's take on types, classes and properties, I've stumbled across what appears to be a 180 degree switcheroo between the current WD of the Primer (2002-04-26) and the Editors' WD (2002-08-23).  In fact, it's so conspicuous (and presumably reflects some well-discussed underlying change in RDF principles) that I almost feel stupid pointing it out, but:

The change isn't in RDF principles;  it's more like I didn't think folks
would really get what's going on here if we continued to refer to
"constraints", nor would they get it if we sort of "glossed it over", so
I rewrote the whole thing with additional detail (based on some recent
discussion, I think some more rewriting will be necessary too).  I admit
its a sea change from a lot of usual "schema" notions though.

> 
> Current WD section 4.2 says:     rdfs:domain specifies the classes' whose instances a particular property may be applied to. The text is clear that this feature is intended to support constraints.
> 
> Editors' Draft (newer) section 5.2 says:    rdfs:domain specifies that if an instance has a particular property, then it's to be regarded as belonging to the designated class. Furthermore, if the property has multiple domain classes, then an instance with this property is to be regarded as a member of *all* those classes. (The text of 5.3 say that this mechanism may be further interpreted by an application however the application likes, one possibility being as a constraint... but see below...)
> 
> Summary:
> Old: Class membership implies something about properties of instances
> New: Properties of instance imply class membership
> 
> Now, suppose I want to use RDFS for the mundane task of specifying structure and data of a few boring tables. I initially think that I can use rdfs:domain as part of my specification of which classes can have which properties.
> 
> The old tutorial would lead me to believe that I *might* expect an RDFS processor to understand the constraints and enforce them.  The new tutorial lets me know that enforcement of constraints is not within the scope of RDFS. OK, I have to provide the constraint implementation, but I at least see how I can use rdfs:domain to specify the constraints.


That's right.  The domain and range declarations still *support*
constraints (if you want to use them that way), they just don't
*mandate* constraints.

> 
> But, so far as I can see, according to the *new* tutorial, the RDFS-prescribed semantics of rdfs:domain basically *precludes* me from using rdfs:domain even as the *basis* for the constraints.

No, this isn't right.  The semantics given for rdfs:domain don't
*preclude* you from using it as a constraint any more than they
*require* that you use it as a constraint.

> 
> First of all, if RDFS *prescribes* that possession of a particular property implies membership in a particular class, then an RDFS processor is violating RDFS if it chooses to interpret the association the other way around as a constraint.  Ie: the constraint mechanism *can't* regard an incoming instance to be in possession of an illegal property, if by RDFS rules, the presence of that property automatically makes the instance a member of a class that *does* permit such a property! (Of course, if that property is not defined anywhere, an app could rule it illegal due to no definition -- but that's not employing an rdfs:domain constraint.)

Let's take the example used in the Primer to illustrate the situation
you're describing.  Suppose I say in the schema that the ex:author
property has a domain of ex:Book, i.e., the schema states the triple: 

ex:author rdfs:domain ex:Book

The Primer says that this means that (a) ex:author is a property, (b)
ex:Book is a class, and (c) any resource that has an ex:author property
is an instance of class ex:Book (less formally what the schema says is,
"'author' is a property that applies to books").  So now suppose I come
in with the RDF information:

ex:Garfinkle rdf:type ex:Cat
ex:Garfinkle ex:author "Fred Smith"

and try to apply the schema domain specification of ex:author to this. 
What you're saying happens is that the processor (note:  I didn't say
"RDFS processor") takes the triples:

ex:author rdfs:domain ex:Book
ex:Garfinkle ex:author "Fred Smith"

and concludes 

ex:Garfinkle rdf:type ex:Book

i.e., that Garfinkle must be a Book (even though you've just said he was
a Cat), *instead of* concluding that someone has screwed up somewhere.

Discussion: 

It's important to separate who knows or does what here.  All RDFS by
itself does is say what it's been told, which is that *as far as it
knows*,
if a resource has an author, it's a book.  It's not by itself a
reasoning
engine that will take your instance data, and what it knows, and
conclude
from

ex:author rdfs:domain ex:Book
ex:Garfinkle ex:author "Fred Smith"

that Garfinkle must be a book.  The RDFS that's been specified may
license that conclusion, but RDFS doesn't require that it be made
(let alone that you must prefer that conclusion to other 
information you have that may contradict it).  Note BTW that there's no
explicit information given anywhere in the example that says a Cat can't
also be a Book.  You may know that yourself (cats might have other
ideas!), but what we're trying to nail down here is what is said
explicitly.

What your *processing application* (the one that might try
to enforce constraints) is effectively presented with is

instance data      ex:Garfinkle rdf:type ex:Cat
instance data      ex:Garfinkle ex:author "Fred Smith"

schema data        ex:author rdfs:domain ex:Book

Your processing application is the one that will decide how it ought to
use this information, since it knows what priority to give to what
(RDFS, on the other hand, doesn't).  That is, your processing
application could say a number of things:

1.  The instance data must be wrong:  Garfinkle must be a Book rather
than a Cat
2.  The instance data must be wrong:  we'll say that it's invalid
3.  The schema must be wrong:  Cats can clearly have authors too
4.  Everyone's right:  Garfinkle is both a Book and a Cat
...

> 
> Second, the multi-class domain seems a further insurmountable problem.  If, as in the tutorial's example, I happen to have one table (class) "Book", and another "MotorVehicle", and they both have a "weight" field (property), I don't appear to be able to specify this in RDFS without causing all records (instances) with a weight property to become instances of both classes.  Since the constraint I'd most obviously want to implement (in this tables scenario) is that each instance must have exactly a certain set of properties (fields) according to its class, the RDFS prescription ends up causing all instances to be required to have all properties -- obviously untenable.

There are two parts to this.  The first is that, as I noted earlier,
we're not *forcing* resources to become instances of classes by making
rdfs:domain declarations;  we're providing information about properties
(but we can't control how that information is used).  The situation you
raise about the interpretation of multi-class domain declarations is a
necessary consequence of what domain declarations actually say.  In
RDF(S), properties are first class things (binary relations).  If you
define a property ex:weight, you're essentially saying there's a
relation called ex:weight with domain and range columns.  If you then
say:

ex:weight rdfs:domain ex:Book
ex:weight rdfs:domain ex:Motorvehicle

you've said "anything in the domain column of ex:weight is a Book" and
(simultaneously) "anything in the domain column of ex:weight is a
Motorvehicle".  For both of those things to be true, the things in the
domain column must both both Books and Motorvehicles.  But again, that's
what RDFS allows you to conclude, but it doesn't force you to (nor does
it force you to use the conclusion even if you draw it).

> 
> Surely this can't be the intent?
> 
> (a) It seems to preclude this mundane but bread-and-butter case
> 
> (b) I don't understand even the rationale for prescribing that class membership be deduced from properties. Why are we expecting RDFS instances to appear without an explicit indication of what class they belong to?

Again, we're not *prescribing* that class membership be deduced from
properties;  we're saying you can use the property information that way
if you want.  However, I assume you're OK with the idea of deriving
*additional* classes from any information that's there, right (like
deriving that foo is a MotorVehicle if the instance data says it's a
Van)?  Also, we're on the Web, and that may mean that I may have only
partial instance data, with no assurance I'll ever find any more, or
there may simply be no class indication.  We can't realistically prevent
that.    


> 
> To take a different run at this: The text in 5.3 "Interpreting RDF Schema Declarations" goes to great length to tout the new emphasis on the *descriptive* orientation of RDFS, and the de-emphasis on the *prescriptive* orientation.
> 
> So from April to August, we seem to have thrown out one of the features that you would *most* expect a thing called a "Schema" to provide  -- the ability to assert in a way that's understood consistently everywhere (even if not enforced) the association of particular properties with instances of particular classes.   Meanwhile, we have *added* some bizarre *prescriptions* that appear to prevent us from privately interpreting the rdfs:domain mechanism to support implementing constraints even within a particular application.
> 

Not quite.  Your reading seems to be that we've thrown out one set of
prescriptions in order to adopt another (wronger) set.  What we've
actually tried to do is throw out prescriptions (or at least leave them
to applications) altogether.  Do you have any suggestions as to how this
might be better described (assuming any of this has made things
clearer)?

--Frank


-- 
Frank Manola                   The MITRE Corporation
202 Burlington Road, MS A345   Bedford, MA 01730-1420
mailto:fmanola@mitre.org       voice: 781-271-8147   FAX: 781-271-8752
Received on Wednesday, 28 August 2002 16:57:18 UTC