Re: RDF Keys, or why RDF is lousy at metadata annotations from Bob MacGregor on 2004-03-15 (www-rdf-comments@w3.org from January to March 2004)

From: Bob MacGregor <macgregor@ISI.EDU>
Date: Mon, 15 Mar 2004 08:14:52 -0800
To: Frank Manola <fmanola@acm.org>
Cc: www-rdf-comments@w3.org
Message-Id: <6.0.3.0.2.20040315080031.01c38d00@tnt.isi.edu>
Frank,

It is absolutely essential that keys be composite -- it would
be unfortunate if there were a long debate on that point.  You
might as well tell the database folks that they ought to do fine
with single-column keys.  As a single example, if I'm tracking
an entity moving in time, the obvious key for each "observation" is
the URI for the entity combined with the timestamp of the
observance.  The need for composite keys would seem to imply that OWL's
inverse functional property is irrelevant.

As to whether keys belong in OWL, or to the follow-on RDF, I
would say that a strong argument in favor of putting keys into RDF
derives from the strong interaction
between keys and the generation of unique, repeatable URIs.  The URI
issue seems to me more like an RDF level thing than an OWL thing.
So, if the RDF folks get serious about annotation and tackle the
URI problem, then they would want keys to be in place.

Cheers, Bob

At 07:25 AM 3/15/2004, Frank Manola wrote:

>Bob--
>
>Defining some kind of key mechanism for RDF might well be a good idea. An 
>alternative approach might be to start with OWL inverse functional 
>properties (which already define a kind of key mechanism) and extend them 
>to handle composite (more than one property) keys (i.e., it seems to me 
>there are two issues here;  one is whether keys should be in RDF rather 
>than (just) in OWL, and a second is whether they should be composite or 
>not).  However, it seems to me that some of the points in your argument 
>require a little further "qualification".
>
>Bob MacGregor wrote:
>
>>Its claimed that RDF and OWL are really great because they
>>facilitate making metadata assertions.  The reality is that the
>>RDF and OWL standards do a lousy job at supporting metadata
>>annotations for (at least) two reasons.
>>To do a good job of annotating (attaching metadata) to something, you need
>>    (1) to reify the something, and
>>    (2) the URI needs to be globally unique, and it needs to be "repeatable".
>>By now its well understood that RDF statement reification is a loser,
>>because its chooses too small a grain size.  The best remedy is to
>>add contexts and quads to RDF.  If we did that, then (1) is taken
>>care of.  However, that's a subject for a different e-mail.
>
>
>I think this argument is a tad dubious.  After all, lots of people use RDF 
>to represent metadata about things without reifying anything, and are 
>reasonably happy with it.  For that matter, lots of metadata is contained 
>in databases, and the databases certainly don't use RDF reification (or 
>anything resembling it).  I really think reification is a red herring in 
>this discussion.  The important point here is that in order to represent 
>metadata about things in RDF you need something that enables you to 
>identify the things you're describing;  i.e., URIs or primary key values.
>
>
>>Here I'm really addressing the Bnode problem.  The proper scope
>>for a bnode is the model that it belongs too.  That means that
>>to reference a resource/entity outside of the model, you need
>>something other than a bnode--you need a resource with a globally
>>unique URI.  Its easy, but quite useless for annotation purposes, to use
>>"gensym" URIs, where you generate a unique URI on the fly, because
>>the next time you load the model, you get a different URI.  In other
>>words, the URI is not "repeatable".   To achieve repeatability, some 
>>misguided
>>proposals suggest concatenating or hashing all of the values of attributes
>>of a resource to create a repeatable URI.  If you do that, the attributes
>>of a resource cannot be updated.
>>The right solution is to generate a URI based on a minimal set of attribute
>>values that can be guaranteed not to change.  This is the definition of a
>>"key" (or primary key).  Proper database tables have primary key definitions.
>>XML items don't have keys, but they should.   And RDF classes should define
>>keys.
>>Defining a key on a class C in RDF is very simple.  A key consists of a
>>set of properties--the order doesn't matter.  If P1 and P2 are properties
>>that define a key for C, then we can invent a new property 
>>"hasKeyProperty" and
>>make two statements:
>>C hasKeyProperty P1 .
>>C hasKeyProperty P2 .
>>and we're done.  Now we have the foundation needed to synthesize
>>unique and repeatable URIs.
>>Hence, my proposal for the follow-on to the current RDF is to define
>>a new predicate equivalent to "hasKeyProperty".
>>Let me address two possible objections. One is that there may exist
>>more than one set of properties that defines a key for a give class.
>>The same is true for database systems, but they have wisely chosen
>>to identify one key as "primary" and declare that one as "the" key.
>>Second, the strategy for forming a unique URI based on a set
>>of key values is left open.  It would be REALLY useful if the
>>committee also tackled this problem.
>>Note: I'm posting this to RDF comments because I'm not soliciting
>>debate on this issue.  Rather, I would like to see it added to the
>>issues list for the next RDF committee.
>
>
>As I said above, I like the idea of being able to define keys.  However, 
>your discussion leaves a few considerations out.  First of all, lots of 
>database designs use "surrogate" keys (artificially-generated values of a 
>specialized property) rather than "natural" keys, particularly where the 
>natural key would have to be composite in order to uniquely identify the 
>entity.  It isn't all one or the other;  which to use is generally 
>considered a design issue.  See, e.g.,
>
>http://www.dbdebunk.com/page/page/626995.htm
>http://r937.com/20020620.html
>
>for some representative discussion.  Note in this connection that things 
>like "company id" or "employee id" (or social security number) are 
>artificially-generated for uniqueness, even though they often are 
>considered "natural" (because they are generated by some authority 
>external to the database).  A URI is in many respects just another 
>surrogate key, specifically generated for uniqueness in a specific context 
>(the Web). Many URIs could also be considered just as natural as a social 
>security number.  For example, identifying a company employee by something 
>like http://example.com/employees/<ssn value> could be considered as 
>prepending "context information" onto a "naturally-occurring" identifier 
>value (a social security number).
>
>Note that this also raises a second point, about the context within which 
>the key value or values are considered unique.  Keys used in databases 
>have an implicit context (that of the database table).  The use of keys in 
>RDF or OWL needs to address the context issue too (the OWL inverse 
>functional property concept includes some consideration of this)
>
>Finally, it's all very well to talk about attribute values that can be 
>"guaranteed not to change", but in practice this is mostly a fantasy. One 
>of the reasons for using surrogate keys in databases is that the folks 
>running the database system have more control over when the values change 
>than for "natural keys", but even then things change (e.g., one company 
>merges with another company, and you need to diddle with the employee 
>numbers, part numbers, and so on).    Although stability of identifier 
>values is certainly a Good Thing, in general, identifier mechanisms need 
>to provide ways of dealing with such situations in ways other than 
>assuming the values never change.
>
>--Frank
>
>
>
>>Cheers, Bob
>
>

=====================================
Robert MacGregor
Senior Project Leader
macgregor@isi.edu
Phone: 310/448-8423, Fax:  310/822-6592
Mobile: 310/251-8488

USC Information Sciences Institute
4676 Admiralty Way, Marina del Rey, CA 90292
=====================================
Received on Monday, 15 March 2004 11:22:49 UTC