- From: Kevin Smathers <kevin.smathers@hp.com>
- Date: Thu, 22 May 2003 14:07:58 -0400 (EDT)
- To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>
- Cc: "(www-rdf-dspace@w3.org)" <www-rdf-dspace@w3.org>
Butler, Mark wrote:
>5. Due to 3, URIs tend to mix identity and version (i.e. date, time). There
>are some disadvantages to mixing these two different axes, particularly as
>different URIs mix them in different ways so they are not algorithmically
>separable. Perhaps it might be useful to separate these axes, as then it
>would be possible to determine from the URIs alone that two resources are
>versions of the same thing. Now this is controversial, as we've already
>discussed an opposing view e.g. identifiers must be random. But from the
>CC/PP work, I'm concious things are much easier for processor developers as
>this may be easier than keeping track of a bunch of metadata that says all
>these identifiers refer to versions of the same resource. For more details
>see
>http://www.hpl.hp.com/techreports/2003/HPL-2003-31.html
>
>
Genesis' position is that it isn't identity and version that are
conflated, but Identity and Content. By 'Identity' and 'Content' I mean
the same distinction as the semweb distinction between 'stating' and
'statement'. The 'Identity' incorporates the concept of an instantiation
and possible metadata such as date and time (your version), but also
other metacharacteristics like owner, access permissions, and trust
among others. In contrast to Identity, Content is data divorced from
the context of its instantiation. Content based identifiers such as SHA
hashes indicate the data that is in a document, but give no information
about where that data came from, who owns it, or if there is a more
recent version.
In Genesis, rather than rely entirely either content based identifiers,
or resource based identifiers, we instead combine the two; a genesis URI
looks something like:
genesis://host/genesis-server/resourceid;contentid
This sets up a means of syntactic transformation of URIs; if the program
that is retrieving the URI only needs the data contained in the
reference then the genesis id can be transformed into a content based
identifier:
hdl:sha1/contentid
Using this identifier the content of the specified document can be
retrieved from any peer that happens to be able to respond to the
content hash, allowing document contents to be widely mirrored
throughout the network.
If the application that is retrieving the data has some other interest
in the document, such as whether that document ever in fact had those
contents, then the full genesis id can be converted to a URL for
retrieval from its canonical owner.
http://host/genesis-server/resourceid;contentid
Analogously, if the application is uninterested in the specific version,
and only wants to retrieve the contents that were most recently assigned
to the resource identifier, then the contentid can be dropped.
http://host/genesis-server/resourceid
It is our opinion that RDF resource references should be listed in the
combination form, that is as full genesis identifiers, since the RDF
creator will have no way of predicting which of these uses a specific
application will have for its resource references, with the exception
that links into the future (for which there is no content at present)
should be expressed by their resource identification.
>6. The concept behind PURLs and Handles is good, i.e. when a resource moves
>you don't need to worry about it. DNS already has a level of indirection
>built in, so why not do this for retrievable resources? This is discussed in
>the Stone paper cited above.
>
>
There are multiple ways to solve 404 errors, including (among others)
URL forwarding, and DNS updates. I can't see any obvious reasons why
handles should be considered more long-term retrievable than URLs are.
Perhaps someone can explain.
Within the domain of URI's, if the custodian of the URI doesn't want to
maintain its linkage over time (e.g: domain name gets taken away,
company goes bust, etc.) then one must rely on higher level social
abstractions. A new web site replaces the old one; update your links if
you care about retrievability.
My problem with URN's or Handles is that I don't see any mechanism for
arbitration. What keeps someone from stepping on your namespace and
allocating invalid or conflicting identifiers. CORBA style UUIDs
(Windows GUIDs?) fall prey to malice and stupidity. And content based
identifiers can only identify content, not instance.
--
========================================================
Kevin Smathers kevin.smathers@hp.com
Hewlett-Packard kevin@ank.com
Palo Alto Research Lab
1501 Page Mill Rd. 650-857-4477 work
M/S 1135 650-852-8186 fax
Palo Alto, CA 94304 510-247-1031 home
========================================================
use "Standard::Disclaimer";
carp("This message was printed on 100% recycled bits.");
Received on Friday, 23 May 2003 03:16:25 UTC