RDF Keys, or why RDF is lousy at metadata annotations

Its claimed that RDF and OWL are really great because they
facilitate making metadata assertions.  The reality is that the
RDF and OWL standards do a lousy job at supporting metadata
annotations for (at least) two reasons.

To do a good job of annotating (attaching metadata) to something, you need
    (1) to reify the something, and
    (2) the URI needs to be globally unique, and it needs to be "repeatable".

By now its well understood that RDF statement reification is a loser,
because its chooses too small a grain size.  The best remedy is to
add contexts and quads to RDF.  If we did that, then (1) is taken
care of.  However, that's a subject for a different e-mail.

Here I'm really addressing the Bnode problem.  The proper scope
for a bnode is the model that it belongs too.  That means that
to reference a resource/entity outside of the model, you need
something other than a bnode--you need a resource with a globally
unique URI.  Its easy, but quite useless for annotation purposes, to use
"gensym" URIs, where you generate a unique URI on the fly, because
the next time you load the model, you get a different URI.  In other
words, the URI is not "repeatable".   To achieve repeatability, some misguided
proposals suggest concatenating or hashing all of the values of attributes
of a resource to create a repeatable URI.  If you do that, the attributes
of a resource cannot be updated.

The right solution is to generate a URI based on a minimal set of attribute
values that can be guaranteed not to change.  This is the definition of a
"key" (or primary key).  Proper database tables have primary key definitions.
XML items don't have keys, but they should.   And RDF classes should define
keys.

Defining a key on a class C in RDF is very simple.  A key consists of a
set of properties--the order doesn't matter.  If P1 and P2 are properties
that define a key for C, then we can invent a new property "hasKeyProperty" and
make two statements:

C hasKeyProperty P1 .
C hasKeyProperty P2 .

and we're done.  Now we have the foundation needed to synthesize
unique and repeatable URIs.

Hence, my proposal for the follow-on to the current RDF is to define
a new predicate equivalent to "hasKeyProperty".

Let me address two possible objections. One is that there may exist
more than one set of properties that defines a key for a give class.
The same is true for database systems, but they have wisely chosen
to identify one key as "primary" and declare that one as "the" key.

Second, the strategy for forming a unique URI based on a set
of key values is left open.  It would be REALLY useful if the
committee also tackled this problem.

Note: I'm posting this to RDF comments because I'm not soliciting
debate on this issue.  Rather, I would like to see it added to the
issues list for the next RDF committee.

Cheers, Bob

Received on Sunday, 14 March 2004 19:00:28 UTC