Re: Some thoughts on the semantics of domain and range (was: Re: RDFS bug "A property can have at most one range property") from Dan Brickley on 2000-09-14 (www-rdf-interest@w3.org from September 2000)

From: Dan Brickley <danbri@w3.org>
Date: Thu, 14 Sep 2000 06:27:58 -0400 (EDT)
To: Lee Jonas <ljonas@acm.org>
cc: www-rdf-interest@w3.org
Message-ID: <Pine.LNX.4.21.0009140529120.4657-100000@tux.w3.org>
On Thu, 14 Sep 2000, Lee Jonas wrote:

> This touched a raw nerve for me.  

Thanks for the feedback. 

> It is definately not useless.  It is an essential part of RDF-Schema that
> makes assertions about RDF model validity.

Can you walk us through an example of how rdfs:domain as currently
defined allows you to validate some RDF model?


> The argument seems to be:
> a) certainty when inferring a resource's type from the predicates it
> exhibits
> vs
> b) being able to define model validity constraints in a far more refined and
> accurate way.
> 
> Arguments:
> * b) is fundamental to RDF-Schema and a) is an adjunct
> * a resource's type will be given explicitly in alot of cases
> * In the absence of an explicit type, and more than one rdfs:domain you can
> always infer that the resource is of type rdfs:Resource.


Yes, a resources type (or one of its types) will often be given. The Web
has a single space of resource identifiers (URIs) but allows many
different ways of carving that space up into classes. Correspondingly,
RDFS allows a resource to be described as being a member of many
classes; not necessarily from the same hierarchy. So there are likely to
be more rdf:type statements inferrable than explicitly represented when  a
resource is mentioned.


> * b) is fundamental to RDF-Schema and a) is an adjunct
[...]
> 
> As defining model validity is RDFS's primary purpose, and inferring types is
> a secondary concern I strongly feel that b) is the way to go.

RDFS does address this (validity; constraint checking), but the spec
doesn't make this its sole concern. 

	RDFS CR Abstract:
	This specification describes how to use RDF to describe RDF
	vocabularies. The specification also defines a basic vocabulary for this
	purpose, as well as an extensibility mechanism to anticipate future
	additions to RDF. 


> 
> >After all, somewhere out there on the web could be another
> >domain statement for foo. We could make a CWA and conclude
> >that baz is an instance of Bar and if we ever find another
> >domain statement for foo, retract our original conclusion.
> 
> Indeed.
> 
> The point about this is that you can never enforce a single domain (or
> indeed range) constraint  on every Property definition with such an open RDF
> model (where anything can be said about anything) in such an open
> environment as the Web (where anyone can say anything).

IMHO it's not about enforcing in the sence of policing and punishment
while believing any and all RDF statements found on the Web. More about: if I
believe the RDF statements I find in Schema s1 and Schema s2, what can I
conclude about this heap of instance data.



Another scenario that favours the conjunctive approach (which
acknowledging the inference/expressivity tradeoff):

Say you're an RDF query processor, and you get a query like
	
	dc:title(X,Y), s2:techAuthor(X,Y), s2:mbox(Y,Z) etc etc

If RDFS semantics allow us to conclude from
rdfs:range(s2:techAuthor,Engineer) that Y must be of type s2:Engineer, we
can use that as a query-planning hint when deciding how best to consult
the database. Say the database keeps stats based on known type membership,
eg that it believes 3, or 30, or 30,000 resources to be of type Person,
Engineer, Document etc. We can use rdfs:range (though not rdfs:domain as
currently defined) to help us answer queries more efficiently. These are
only hints btw; there might be some resource known to the database that
matches 'Y' in the query about without our having any type information
about it.



> >The place where the single domain/range requirement hurts
> >is when the range or domain of an arc tends to be
> >"oddly shaped", i.e., not have a single class corresponding
> >to it, but is a conjunct or disjunct of mulitple classes
> >where the conjunct/disjunct does not define a
> >"natural kind" (as in Quine's use of the term "natural
> >kind"). Non-natural kinds are a problem all around.
> 
> I'm not sure what you mean.  What is a "natural kind"?

Philosophy / Logic / Cognitive Science jargon. The basic idea being that
our understanding of the meaning of many natural language terms (water,
apples, people, planet etc.) is made easier because the world seems to us
to be carved up according to such categories ("natural kinds"), whereas
other categories seem more ad-hoc and artificial, ie. artifacts of
language rather than being names that point to "real
world" categories. This notion relates to schema usability in cases where
schema language semantics force vocabulary designers to define ad-hoc
placeholder classes that don't intuitively feel like they relate to
anything "real". Note that we can (thankfully) remain agnostic about the
question of whether the Universe really _does_ come pre-carved into
natural kinds; the issue is more to do with the psychology of
categorisation. To the extent that schemas define classes that key into
our intuitions about worldy categories, we can share a little
meaning. But eg. Where we find ourselves defining classes such as
'ReadableOrWearableObject', we've moved beyond the natural kinds. Where we
have a class such as 'Water','Lemon' etc., the class definition hooks more
easily into the unarticulated shared assumptions we all have about what
defines such a class, and into how the world is carved up.

We can mine natural language for useful categories; I started doing this
with an RDF representation of WordNet, which has since been elaborated on.
http://xmlns.com/wordnet/1.6/Water
http://xmlns.com/wordnet/1.6/Lemon
http://xmlns.com/wordnet/1.6/Planet
http://xmlns.com/wordnet/1.6/Person for examples of natural kind terms
projected into RDF classes. (slightly broken RDF graphs but you get the
idea)


Non natural kind terms are more like queries, couched in terms of more
stable and intuitive categories.

eg...

	[[
	A lovely quote by Borges is given to demonstrate
	the sort of system that doesn't occur in the classification of
	animals: `a) those that belong to the emperor, b) embalmed ones, c)
	those that are trained.....'
	]] quoting Rosch in  	
	http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Foundations.Cognition/0058.html
	
Um... dunno if that helped. 



> In summary:
> * objectivity about all RDF statements on the entire Web is going to be
> impossible - subjectivity (relative to a 'frame of reference') is the only
> recourse, therefore conclude stuff based on your viewpoint.

But... this notion of a viewpoint conflates 'all the rdf statements I have
to hand and that I believe' with a subjective frame of reference that is
independent of the current set of statements one knows about. We all agree
that we won't believe all possible RDF statements at once, nor that we'll
have them loaded into a local database at once. But we want to reason
about some local collection of data while bearing in mind that there are
more facts out there that we'd believe given half the chance.

> * rdfs:domain is _very_ useful as it currently stands for asserting model
> validity (albeit based on your current frame of reference).

It would be really useful if you could walk us through your current
processing model w.r.t. rdfs:domain as currently defined.

cheers,

Dan
Received on Thursday, 14 September 2000 06:27:57 UTC