Re: Some thoughts on the semantics of domain and range (was: Re: RDFS bug "A property can have at most one range property") from Dan Brickley on 2000-09-14 (www-rdf-interest@w3.org from September 2000)

From: Dan Brickley <Daniel.Brickley@bristol.ac.uk>
Date: Thu, 14 Sep 2000 10:13:31 +0100 (BST)
To: Guha <guha@guha.com>
cc: Natalya Fridman Noy <noy@SMI.Stanford.EDU>, www-rdf-interest@w3.org
Message-ID: <Pine.GHP.4.21.0009140906370.2047-100000@mail.ilrt.bris.ac.uk>
On Wed, 13 Sep 2000, Guha wrote:

> I wasn't trying to defend Cyc's choice (though I probably should,
> since I made it). I was just trying to explain RDFS.

I think you've very well explained the spirit rather than the letter of RDFS.

	http://www.w3.org/TR/2000/CR-rdf-schema-20000327/
	http://www.w3.org/TR/2000/CR-rdf-schema-20000327/#s3.1

rds:range: "A property can have at most one range property"

This (see TimBL complaint) is problematically couched, though a
rewording should be easy. Rather than express the conjunctive semantics
of range in terms of dissalowing multiple rdfs:range statements about a
resource, we should simply be clearer about what these rdf:range
statements mean, ie. what we can infer when we see them.


rdfs:domain 
 "A property may have zero, one, or more than one class as its domain. If
	there is no domain property, it may be used with any resource. If there
	is exactly one domain property, it may only be used on instances of that
	class (which is the value of the domain property). If there is more than
	one domain property, the constrained property can be used with instances
	of any of the classes (that are values of those domain properties). "

Here we've a more serious issue at stake. The *any* rather than *all*
here was a WG decision made before the introduction of the
rdfs:subPropertyOf construct. At that time, RDFS had no mechanism to
describe specialization hierarchies amongst property types, ie. we
couldn't say how the s2:techAuthor property related to
s1:author property. So a major motivation for the weak
(void) semantics of rdfs:domain was to avoid having dozens of trivially
different properties defined by folk around the Web, and no way to
relate them. We then added rdfs:subPropertyOf...


For eg: take the authors/Persons/Documents example. We can say
domain(author,Document).
Someone else might subsequently grumble that this was a bad bit of
modelling, since (say) Songs can also have an author. Now if they want
to re-use the 'author' relation on some resource of type Song, it'll
(with conjunctive semantics) imply that their song is also a document.

And for sake of this exammple, people don't like that. So they go create
a new similar-but-not-the-same property s2:songAuthor. Since this was
before rdfs:subPropertyOf, this was seen as a bad thing, so we weakened
(voided) the semantics of rdfs:domain to encourage property re-use at
the expense of supporting inference.

So, the argument at the time was that disjunctive semantics will lessen
this tendency to have lots of redundantly similar properties (author,
s2:songAuthor, s3:technicalAuthor). From what I've seen of RDF
implementations, people (including yourself) are often assuming the spec
says the expected thing, rather than what it actually says. Combine that
with the existence of rdfs:subPropertyOf, which provides some basis for
hanging together hierarchical families of similar-but-not-the-same rdf
properties (author, songAuthor, techAuthor etc), and I think there's a
case to be made for tightening the semantics of rdfs:domain.





> On the net, disjunctive semantics for range/domain can be a real
> problem. Lets assume that we know that the domain for an arc foo
> is the Class Bar and that a resource baz has the arc foo. With
> disjunctive semantics, the domain information is pretty much
> useless to us since it does not allow us to conclude anything. After
> all, somewhere out there on the web could be another domain
> statement for foo. We could make a CWA and conclude that
> baz is an instance of Bar and if we ever find another domain
> statement for foo, retract our original conclusion.

(decoder: CWA=close world assumption. We could do with an RDF IG glossary)

Yep. Same with rdfs:range as currently worded, less obviously. We find
outselves retracting rdfs:range beliefs to meet the 'at most one'
constraint. (Or else not using rdfs:range to represent range
constraints that apply to a property by virtue of applying to one of
its super properties.)

You might believe the (an) rdfs:range of techAuthor to be Person, since 
subPropertyOf(techAuthor,author). 
range(author,Person).



Scenario:

(I've gone into slightly more detail here than you probably want/need,
but wanted to walk through a plausible use case to give this discussion
a bit of context...)

Say you have the well known s1 Schema (defining author) plus a 'whats
new' RDF file for that schema (eg. acquired via HTTP or signed Usenet
msg; whatever) which mentions that some folks elsewhere
have defined these exciting new subPropertiesOf s1:author, and that
more info about s2:techAuthor is available from URI reference
http://s2.example.com/#techAuthor (we can use rdfs:seeAlso to say this)

[aside: this is the sort of scenario we're getting into in a real way
with Dublin Core qualification models and relationship to other specs,
eg. DC <-> RSS, DC<->IMS/LOM etc. ]

So what do you know at this point? That techAuthor is a subPropertyOf
author, and that range of author is Person. While the spec doesn't call
out that techAuthor has a range of Person, we want to act as if that
were the case.

Back with the example scenario. Having read just the s1 schema and the
"What's new with S1 applications" machine-readable bulletin, I should be
able to deal sensibly when I stumble across
s2:techAuthor(guhas-phd, anon1).
s2:personalMbox(anon1, mailto:guha@guha.com).
...and infer that anon:1 must be of rdf:type Person. Without reading the 
s2 Schema. 

So... I'm acting as if range(techAuthor,Person) is true.

Then I finally get around to reading the s2 schema (say I encounter so
many s2 vocab constructs in use that some heuristic is tripped and my
rdf robot goes gets the schema). I find out that it asserts
range(techAuthor, Engineer). Interesting. I now know even more.

Now since RDFS CR (prior to all this implementor feedback) says
"A property can have at most one range property" we have to rollback
from what we previously believed, and remove any trace of 
range(techAuthor,Person) from out database.

Kind of quirky way of specifying conjunctive semantics on rdfs:range. 

Less of a big deal to fix than rdfs:domain. It'll affect implementations
but not modelling styles.



> The place where the single domain/range requirement hurts is
> when the range or domain of an arc tends to be "oddly shaped",
> i.e., not have a single class corresponding to it, but is a conjunct
> or disjunct of mulitple classes where the conjunct/disjunct does
> not define a "natural kind" (as in Quine's use of the term "natural
> kind"). Non-natural kinds are a problem all around.

Yep. 

So what do we do? 

Dan

> 
> Allowing multiple domains/ranges with conjunctive semantics solves
> some of these problems without introducing mentioned earlier, but
> we are still left with the other problem mentioned earlier.
> 
> We could make a CWA and conclude that
> baz is an instance of Bar and if we ever find another domain
> statement for foo, retract our original conclusion. But the whole
> non-monotonic reasoning using a TMS for dependency maintanence
> game needs to be revisited in the context of inferencing on the web.
> That game was defined based on a KB update model that does
> not neccesarily apply here.
> 
> guha
> 
> Natalya Fridman Noy wrote:
> 
> > At 10:56 AM 09/13/2000 -0700, guha wrote:
> > >rdfs:domain and rdfs:range were modelled after the similarly named
> > >concepts
> > >in Cycl and have had very well defined meanings right from the beginning.
> > >
> > >(rdfs:domain ?arc ?domain) ^ (?arc ?source ?target) => (rdf:type ?source
> > >?domain)
> > >and
> > >(rdfs:range ?arc ?range) ^ (?arc ?source ?target) => (rdf:type ?target
> > >?range)
> > >
> > >and thats it.
> >
> > Actually, Cyc's conjunctive semantics  for domains and ranges can (and
> > does) force modeling choices that sometimes make the whole concept of
> > domain and range practically useless. Here is an example (if memory serves,
> > it comes directly from Cyc). Consider the domain of a property
> > wearingSomething. A natural domain would be Person. However, dogs can also
> > wear something, so we have to make Animal a domain of wearingSomething (by
> > the by, allowing lions to wear things as well). In addition, manikins can
> > wear something. Now we have to go up to TangibleThing making the
> > declaration of domain essentially useless. Similar argument holds for range.
> >
> > This conjunctive semantics for domains and ranges in Cyc was in fact a
> > problem for the OKBC systems when Cyc knowledge bases were translated into
> > frame-based OKBC-compatible KR systems such as Ontolingua and Protege:
> > Since domains and ranges of properties had to be maximally general,
> > high-level classes had hundreds of slots (properties) that had little
> > meaning for that class.
> >
> > In fact, OKBC adopted the disjunctive semantics for domains and ranges of
> > slots (perhaps, for practical reasons), and it seemed to work well there.
> >
> > Natasha
> 
>
Received on Thursday, 14 September 2000 05:13:40 UTC