Re: RDF Keys, or why RDF is lousy at metadata annotations

Hi Bob

I am sympathetic, and I understand you don't want get into a long 
debate; but I don't know what you are talking about.

>Its claimed that RDF and OWL are really great because they
>facilitate making metadata assertions.

Is it? By whom? My understanding is that they are intended to be 
languages for putting ontologies on the Web.

>The reality is that the
>RDF and OWL standards do a lousy job at supporting metadata
>annotations for (at least) two reasons.
>
>To do a good job of annotating (attaching metadata) to something, you need
>    (1) to reify the something, and
>    (2) the URI needs to be globally unique, and it needs to be "repeatable".

OK, granted (though why do you need to reify it? Isnt referring to it 
good enough?). Both of these issues seem to be concerned with how to 
attach URIs to the somethings, which indeed is an issue that the 
RDF/OWL specs mostly do not address, I think because it was outside 
the charter of the relevant working groups. (OWL does talk some about 
giving URI names to OWL ontologies, but not in any detail, and not 
about any other kind of naming.)

>By now its well understood that RDF statement reification is a loser,
>because its chooses too small a grain size.

Well, I tend to agree, but that seems irrelevant to your concern, 
since RDF reification was only ever intended to reify RDF itself. I 
take it that you are interested in using RDF/OWL to give metadata for 
things other than RDF/OWL (?? Or are you just concerned with doing a 
better job than RDF reification for using RDF to describe other RDF?? 
If so, join the club :-)

>  The best remedy is to
>add contexts and quads to RDF.  If we did that, then (1) is taken
>care of.  However, that's a subject for a different e-mail.
>
>Here I'm really addressing the Bnode problem.

Eh?? What bnode problem? (Did you just change the subject entirely, 
or is there a problem relating bnodes to URI attachment and 
reification?)

>The proper scope
>for a bnode is the model that it belongs too.

The graph it occurs in, yes.

>That means that
>to reference a resource/entity outside of the model, you need
>something other than a bnode

No, that does not follow at all. The bnode has a syntactic scope - it 
is essentially a bound variable -  but the entity or entities 
REFERRED TO by the bnode aren't scoped or limited in any way. Bnodes 
can refer to anything in the universe of discourse.

Did you mean that in order to provide a name for something that can 
be used outside the graph, you should not use a bnode as a 'label'? 
That would be correct, but I don't see why you call this a "problem". 
It seems kind of dumb on the face of it to use a blank node as a 
label. That isnt what blank nodes were ever intended to do.

>--you need a resource with a globally
>unique URI.

Right. But all URIs are globally unique, so you just need to have a 
URI attached to a resource.

>Its easy, but quite useless for annotation purposes, to use
>"gensym" URIs, where you generate a unique URI on the fly, because
>the next time you load the model, you get a different URI.

But they are both unique. I think you mean, a globally unique and stable URI.

There is absolutely no way to ensure that anything (data or 
otherwise) has only a single URI which refers to it.

>In other
>words, the URI is not "repeatable".

Well, the old URI still works, right? (If it worked before, what made 
it stop working because some other URI got generated??)

This seems to be the core issue. (?) That is, you want a way to 
(automatically?) generate URIs which can be used to name data (and 
hence to anchor metadata) stably, so that once created, they remain 
attached to the data they were created for. OK, good idea. But I 
don't think this is a job for RDF/OWL (and it wasn't in their 
charter): I think it has to do with URI deployment more generally.

The fact is, as I tried to tell the W3C Tech Plenary meeting in 
Boston last year, the Web as a whole has no way to give a name to 
ANYTHING. It has no protocols for baptism or naming: it just muddles 
along by relying on HTTP protocols and a few other essentially 
transmission protocols for locating things in networks, and pretends 
that it is assigning referents. But in fact, nothing on the Web 
really assigns referents at all.

>  To achieve repeatability, some misguided
>proposals suggest concatenating or hashing all of the values of attributes
>of a resource to create a repeatable URI.  If you do that, the attributes
>of a resource cannot be updated.
>
>The right solution is to generate a URI based on a minimal set of attribute
>values that can be guaranteed not to change.

Where are those attributes to be found, in general? A URI can 
identify (= be the name of, refer to) anything. Most things don't 
have unchanging attributes.

>This is the definition of a
>"key" (or primary key).  Proper database tables have primary key definitions.
>XML items don't have keys, but they should.   And RDF classes should define
>keys.

I presume you mean RDFS classes (and OWL classes?). I don't see how 
this would be possible in general. Are you saying that all classes 
should provide keys? But classes might contain anything. What about 
classes of things like abstractions, or classes of entities that have 
only a transient existence? What about rdf:Resource?  What about 
entities that are not known to be in any particular class? What about 
entities that are in more than one class?

>Defining a key on a class C in RDF is very simple.  A key consists of a
>set of properties--the order doesn't matter.  If P1 and P2 are properties
>that define a key for C, then we can invent a new property 
>"hasKeyProperty" and
>make two statements:
>
>C hasKeyProperty P1 .
>C hasKeyProperty P2 .
>
>and we're done.

Not quite. That does not refer to a set of properties. You cannot 
assume that because

C hasKeyProperty P19 .

has not been asserted, that it will not be true. In general in RDF 
there is no way to 'close off' any collection other than by using the 
collection vocabulary.

But in any case, I fail to see how this provides what you want. 
Suppose this triple/s is/are asserted: how does that give you an 
actual key? For example, you might not know any values for those 
properties.

>Now we have the foundation needed to synthesize
>unique and repeatable URIs.

I do not follow you. In what way does this provide a basis for 
synthesizing URIs ? And how does membership in a class provide any 
guarantee of repeatability? There is no implication that a class must 
contain permanent things, or even that the class itself has any 
permanent or lasting existence: consider for example classes which 
are themselves only mentioned in passing, such as OWL restriction 
classes

>Hence, my proposal for the follow-on to the current RDF is to define
>a new predicate equivalent to "hasKeyProperty".

If I follow this proposal, how does

C  rdf2:hasKeyProperty  P .

differ from

P owl:domain C .
P rdf:type owl:functionalProperty .

?

>Let me address two possible objections. One is that there may exist
>more than one set of properties that defines a key for a give class.
>The same is true for database systems, but they have wisely chosen
>to identify one key as "primary" and declare that one as "the" key.

That is the least of the problems. The most serious one is that many 
(most?) classes will have no key properties.  Others are: given an 
individual, how to choose which class to use to identify it; how the 
key properties of a derived class are supposed to relate to those of 
the deriving properties; how to handle owl:sameAs (ie equality) in a 
key-based class framework; and how to relate this use of URIs to 
other constraints on URI usage.

>Second, the strategy for forming a unique URI based on a set
>of key values is left open.  It would be REALLY useful if the
>committee also tackled this problem.

Way beyond the scope of an RDF WG. This is a TAG-level issue.

>Note: I'm posting this to RDF comments because I'm not soliciting
>debate on this issue.  Rather, I would like to see it added to the
>issues list for the next RDF committee.

I'm not empowered to do anything about this, but IMO this topic isn't 
even in the scope of an RDF group. It is both too special and too 
general for consideration only within the context of RDF.

Pat

>
>Cheers, Bob


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes

Received on Monday, 15 March 2004 15:57:29 UTC