Re: Resources and URIs - different readings of RDF M&S? from Sergey Melnik on 1999-12-08 (www-rdf-interest@w3.org from December 1999)

From: Sergey Melnik <melnik@DB.Stanford.EDU>
Date: Tue, 07 Dec 1999 21:48:16 -0800
To: Dan Brickley <danbri@w3.org>
CC: RDF Interest Group <www-rdf-interest@w3.org>
Message-ID: <384DF120.CB3F0B5D@db.stanford.edu>
Now I have a better idea of what we are discussing, too. What I had in
mind was a purely syntactical issue. However, it seems that I
underestimated the scope of the problem (correct me if I'm wrong - this
is an important point) and we have different readings of the RDF M&S
specification in mind. 

MODEL LEVEL:
------------

I believe, Dan, your reading of the RDF specs is the following:

There exist two intrinsically different types of resources: named
resources and anonymous resources. Anonymous resources are
*distinguishable* on the model level. That means, the application that
looks at a resource (e.g. using an API) must be able to tell whether it
is anonymous or not (e.g. it starts with a specific prefix).

For this reading, what is the semantics of an anonymous resource as
opposed to a named one? Since the application cannot see an empty ovel
with outgoing arcs, it must use a URI for an anonymous resource. Thus,
whenever the application encouteres an anonymous resource, it might
think the following:

"This is a URI that identifies this resource well enough in the given
context. However, there exist a more persistent URI for this resource,
which I don't know."

Thus, looking at say urn:rdf:resource:987654321, the application treats
it as a temporary placeholder for something like
http://www.w3.org/staffId/765432. The staff ID is obviously a better
means to identify the person. However, we can easily imagine a more
stable, employer-independent identifier, say
uri:uk:gov:person:123456789, or even citizenship-independant one
uri:people:357246890 that is even superior.

In other words, it seems to me that there is no principal difference
between an anonymous resouce and an explicitly named one. A real-world
entity can have different names, there is nothing wrong with it. Some of
them are more stable, others are less. Furthermore, the stability is
questionable, even for anonymous resources. Imagine the following piece
of a breaking news report:

"He saw something surrounded by a miriad of flashing lights rotating at
a breathtaking speed"

Since we do not have a better name for "something", it would probably
get an "anonymous" URI like uri:rdf:resource:XYZ in the report. However,
as more people start referring to it, it can become the "official" name
for the first UFO successfully landed in North America.

Thus, in my view, anonymous resources are first-class citizens in RDF
and should not be discriminated by whether they (arguably) have less
persistence that other URIs. 

Therefore, I prefer the following usage scenario for anonymous URIs: if
we speak about something whose "well-known" name is not available (e.g.
we are just lazy to lookup the ISBN number of a book), we introduce a
new unique identifier for it, created e.g. using a cryptographically
strong algorithm (such as a time-based UUID). You can call it an
"anonymous" URI if you want, but it carries no additional semantics.

The above procedure is the second reading (my version) of the RDF M&S
specs. The application cannot distinguish whether a resource is
anonymous or "kind of" persistent. If my reading is wrong and the first
is the correct one, RDF M&S *must* specify how the application can
distinguish between anonymous and named resources. Must they have a
special prefix? The specs does not answer this question.

SYNTAX LEVEL:
-------------

This is my original statement of the problem. That is, I assumed that
anonymous URIs (empty ovals in the specs) are just a by-product of the
convenience not to have to invent/generate anonymous URIs for RDF
produced by humans.

If RDF is created by an application, every node has already a URI, the
problem does not exist.

If a human is writing RDF directly in a serialized form, (s)he does not
want to run a random URI generator every time (s)he mentions something
whose identity is not known at the moment. Thus, this task is shifted to
the parser. And, for interoperability, the anonymous URIs created for
the resources by two different parsers should be identical, given the
same content with the same origin.

------

> > By restricting the context say to the origin and content of the RDF
> > description the problem can be solved relatively easily.
> 
> I'm still not convinced. Are you claiming that anonymity is a property
> intrinsic to a resource rather than it's mention in some context?

You told you were not dealing with semantics. Yet do you, and at a very
deep level. As it stands now, the RDF model does not give you a built-in
facility to differentiate between a resource, or a "mention" of the
resource (whatever that is). The "mention" of the resource would be just
another, alternative URI for the same real-world thing. The "real" URI
for the resource might not even exist. If I write in RDF/XML

<Book>
  <dc:Creator>Sergey Melnik</dc:Creator>
  <dc:Publisher>Addison-Wesley</dc:Publisher>
</Book>

then I'm speaking of an instance of Book having dc:Creator this and
dc:Publisher that. There is no "better" URI for this resource, like ISBN
number, the thing (book with such properties) just does not exist at
all.

The unique parsing algorithm would guarantee that the same URI is
generated for this imaginary book by every parser fetching this content
from my Web page. That frees me from the need to write

<Book rdf:about="urn:rdf:resource:238e89a878d5e302aed362e2d9a0caa8">
  <dc:Creator>Sergey Melnik</dc:Creator>
  <dc:Publisher>Addison-Wesley</dc:Publisher>
</Book>

if I want to allow other people to refer to the product of my
imagination. In the current specs, by not providing the URI explicitly I
*disallow* people refer to my stuff without copying the entire
description.


Sergey
Received on Wednesday, 8 December 1999 00:42:40 UTC