- From: Kevin Smathers <kevin.smathers@hp.com>
- Date: Thu, 22 May 2003 14:07:58 -0400 (EDT)
- To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>
- Cc: "(www-rdf-dspace@w3.org)" <www-rdf-dspace@w3.org>
Butler, Mark wrote: >5. Due to 3, URIs tend to mix identity and version (i.e. date, time). There >are some disadvantages to mixing these two different axes, particularly as >different URIs mix them in different ways so they are not algorithmically >separable. Perhaps it might be useful to separate these axes, as then it >would be possible to determine from the URIs alone that two resources are >versions of the same thing. Now this is controversial, as we've already >discussed an opposing view e.g. identifiers must be random. But from the >CC/PP work, I'm concious things are much easier for processor developers as >this may be easier than keeping track of a bunch of metadata that says all >these identifiers refer to versions of the same resource. For more details >see >http://www.hpl.hp.com/techreports/2003/HPL-2003-31.html > > Genesis' position is that it isn't identity and version that are conflated, but Identity and Content. By 'Identity' and 'Content' I mean the same distinction as the semweb distinction between 'stating' and 'statement'. The 'Identity' incorporates the concept of an instantiation and possible metadata such as date and time (your version), but also other metacharacteristics like owner, access permissions, and trust among others. In contrast to Identity, Content is data divorced from the context of its instantiation. Content based identifiers such as SHA hashes indicate the data that is in a document, but give no information about where that data came from, who owns it, or if there is a more recent version. In Genesis, rather than rely entirely either content based identifiers, or resource based identifiers, we instead combine the two; a genesis URI looks something like: genesis://host/genesis-server/resourceid;contentid This sets up a means of syntactic transformation of URIs; if the program that is retrieving the URI only needs the data contained in the reference then the genesis id can be transformed into a content based identifier: hdl:sha1/contentid Using this identifier the content of the specified document can be retrieved from any peer that happens to be able to respond to the content hash, allowing document contents to be widely mirrored throughout the network. If the application that is retrieving the data has some other interest in the document, such as whether that document ever in fact had those contents, then the full genesis id can be converted to a URL for retrieval from its canonical owner. http://host/genesis-server/resourceid;contentid Analogously, if the application is uninterested in the specific version, and only wants to retrieve the contents that were most recently assigned to the resource identifier, then the contentid can be dropped. http://host/genesis-server/resourceid It is our opinion that RDF resource references should be listed in the combination form, that is as full genesis identifiers, since the RDF creator will have no way of predicting which of these uses a specific application will have for its resource references, with the exception that links into the future (for which there is no content at present) should be expressed by their resource identification. >6. The concept behind PURLs and Handles is good, i.e. when a resource moves >you don't need to worry about it. DNS already has a level of indirection >built in, so why not do this for retrievable resources? This is discussed in >the Stone paper cited above. > > There are multiple ways to solve 404 errors, including (among others) URL forwarding, and DNS updates. I can't see any obvious reasons why handles should be considered more long-term retrievable than URLs are. Perhaps someone can explain. Within the domain of URI's, if the custodian of the URI doesn't want to maintain its linkage over time (e.g: domain name gets taken away, company goes bust, etc.) then one must rely on higher level social abstractions. A new web site replaces the old one; update your links if you care about retrievability. My problem with URN's or Handles is that I don't see any mechanism for arbitration. What keeps someone from stepping on your namespace and allocating invalid or conflicting identifiers. CORBA style UUIDs (Windows GUIDs?) fall prey to malice and stupidity. And content based identifiers can only identify content, not instance. -- ======================================================== Kevin Smathers kevin.smathers@hp.com Hewlett-Packard kevin@ank.com Palo Alto Research Lab 1501 Page Mill Rd. 650-857-4477 work M/S 1135 650-852-8186 fax Palo Alto, CA 94304 510-247-1031 home ======================================================== use "Standard::Disclaimer"; carp("This message was printed on 100% recycled bits.");
Received on Friday, 23 May 2003 03:16:25 UTC