Re: RDF Keys, or why RDF is lousy at metadata annotations from Brian McBride on 2004-03-16 (www-rdf-comments@w3.org from January to March 2004)

From: Brian McBride <bwm@hplb.hpl.hp.com>
Date: Tue, 16 Mar 2004 08:52:45 +0000
To: Bob MacGregor <macgregor@ISI.EDU>
Cc: www-rdf-comments@w3.org
Message-ID: <4056C05D.6080601@hplb.hpl.hp.com>
Hi Bob,

Quoting from the end of your message:

 > Note: I'm posting this to RDF comments because I'm not soliciting
 > debate on this issue.  Rather, I would like to see it added to the
 > issues list for the next RDF committee.

Whilst I'm happy to add things to RDFCore's postponed issues list, I 
suggest the best course here is to approach the newly formed Semantic 
Web Best Practicies and Deployment Working Group.

Let me see if I have understood properly the point you are raising. 
Leaving aside reification for the moment,   I think you have a use case, 
a generic problem, a proposed solution and reasons to prefer the 
proposed solution over alternatives.  I might have gone a little beyond 
what you have explicitly said in trying to fill some gaps I perceived. 
Please feel free to correct me.

Use Case (adapted from a talk you gave as ISWC(?))

Consider the case of recording observations, say about birds.  Lets say 
we have an ontology which defines a type bird:Observation.  There may be 
many observations and there is no natural URI to assign to each 
observation.  Each observation might be said to be uniquely identified 
by the values of two properties of the observation, say bird:ringNumber 
and bird:time.

Problem Statement:

It is desirable to be able to refer to specific observations from 
outside the graph containing an observation.  Whilst this might be done 
by some sort of query expression, there is no definition for such a 
query expression and it would be simpler to generate a URI for the 
observation.

Whilst one could generate an arbitrary URI when the observation is made 
it is possible that the 'same' observation might be made independently 
and it is desirable that the same URI be generated in each independent case.

Outline of a Proposed Solution:

The idea is to define for a given class that the values of a specific 
set of its properties 'identify' instances of that class and to define a 
  function to generate a URI for such instances.

Put another, away given a function F that generates URIs from arguments

   _:o rdf:type bird:Observation,
       bird:ringNumber _:n,
       bird:location   _:l .

entails

   _:o owl:sameAs < F(base, _:n, _:l) > .

Sets of properties that identify instances of a class could be described 
as a property of the class using RDF and the RDF collection mechanism.

Alternative Approaches:

1) Use a GENSYM mechanism to generate a URI when the observation is 
made.  This does not address the need for independently generating the 
same URI.

2) Use a generalization of owl:InverseFunctionalProperty.  This requires 
more complicated implementation mechanisms.

How well did I do?

My preference is for you to take this issue to the SWBPD working group, 
  for consideration.   If you feel you have already done your bit and 
don't have the energy for it, then I'll find a sponsor (default me) to 
do it on your behalf.

Does that work for you?

Brian




Bob MacGregor wrote:


> 
> Its claimed that RDF and OWL are really great because they
> facilitate making metadata assertions.  The reality is that the
> RDF and OWL standards do a lousy job at supporting metadata
> annotations for (at least) two reasons.
> 
> To do a good job of annotating (attaching metadata) to something, you need
>    (1) to reify the something, and
>    (2) the URI needs to be globally unique, and it needs to be 
> "repeatable".
> 
> By now its well understood that RDF statement reification is a loser,
> because its chooses too small a grain size.  The best remedy is to
> add contexts and quads to RDF.  If we did that, then (1) is taken
> care of.  However, that's a subject for a different e-mail.
> 
> Here I'm really addressing the Bnode problem.  The proper scope
> for a bnode is the model that it belongs too.  That means that
> to reference a resource/entity outside of the model, you need
> something other than a bnode--you need a resource with a globally
> unique URI.  Its easy, but quite useless for annotation purposes, to use
> "gensym" URIs, where you generate a unique URI on the fly, because
> the next time you load the model, you get a different URI.  In other
> words, the URI is not "repeatable".   To achieve repeatability, some 
> misguided
> proposals suggest concatenating or hashing all of the values of attributes
> of a resource to create a repeatable URI.  If you do that, the attributes
> of a resource cannot be updated.
> 
> The right solution is to generate a URI based on a minimal set of attribute
> values that can be guaranteed not to change.  This is the definition of a
> "key" (or primary key).  Proper database tables have primary key 
> definitions.
> XML items don't have keys, but they should.   And RDF classes should define
> keys.
> 
> Defining a key on a class C in RDF is very simple.  A key consists of a
> set of properties--the order doesn't matter.  If P1 and P2 are properties
> that define a key for C, then we can invent a new property 
> "hasKeyProperty" and
> make two statements:
> 
> C hasKeyProperty P1 .
> C hasKeyProperty P2 .
> 
> and we're done.  Now we have the foundation needed to synthesize
> unique and repeatable URIs.
> 
> Hence, my proposal for the follow-on to the current RDF is to define
> a new predicate equivalent to "hasKeyProperty".
> 
> Let me address two possible objections. One is that there may exist
> more than one set of properties that defines a key for a give class.
> The same is true for database systems, but they have wisely chosen
> to identify one key as "primary" and declare that one as "the" key.
> 
> Second, the strategy for forming a unique URI based on a set
> of key values is left open.  It would be REALLY useful if the
> committee also tackled this problem.
> 
> Note: I'm posting this to RDF comments because I'm not soliciting
> debate on this issue.  Rather, I would like to see it added to the
> issues list for the next RDF committee.
> 
> Cheers, Bob
> 
> 
> 
>
Received on Tuesday, 16 March 2004 03:53:45 UTC