- From: David R. Karger <karger@theory.lcs.mit.edu>
- Date: Tue, 8 Apr 2003 00:47:29 -0400
- To: ks@micky.hpl.hp.com
- CC: Mark_Butler@hplb.hpl.hp.com, www-rdf-dspace@w3.org
Date: Mon, 7 Apr 2003 08:47:07 -0700 From: Kevin Smathers <ks@micky.hpl.hp.com> Cc: Mark_Butler@hplb.hpl.hp.com, www-rdf-dspace@w3.org Content-Disposition: inline X-SBClass: Nonlocal Origin [156.153.255.206] X-Spam-Status: No, hits=-4.4 required=5.0 tests=IN_REP_TO version=2.20 X-Spam-Level: X-SpamBouncer: 1.5 (2/23/03) X-SBPass: NoBounce X-SBClass: OK X-Folder: Bulk On Mon, Apr 07, 2003 at 02:10:59AM -0400, David R. Karger wrote: > Randomness avoids collision? I would say rather that federation > avoids collision, randomness only invites it. > > Depends. If URLs are random 128-bit integers, collisions are > hell-freezes-over unlikely. > Assuming one has a mechanism for enforcing randomness, then I agree. The problem with MD5 sums is that the contents of the URL become immutably linked to the URL itself. Invariant documents have lots of nice features; distribution, cache control, and cache verification become trivial, but on the down side there is no consistent address for the top of tree of a document history. If you want to be able to modify your document after publishing its MD5-sum URL, then you will also have to accept that maintaining randomness is cooperative. I find it quite useful to distinguish between a particular bit blob (which by definition cannot change: if it changes it is a different blob) and a document (which at one time can be a certain bit blob and at another time a different bit blob. Bit blobs can benefit from MD5 URLs. As I mentioned in previosu email, mutable things can't have MD5 URLs. It's nice to be able to say "at this time this document had these bits." Or "this bit blob was replaced by this one" Cooperative randomness reduces the likelihood of collision from hell-freezes-over unlikely, to logical-truth. [...] > > The URL > remains the same, there just needs to be a way of interpreting additional > constraints on the content within that URL. I think this is analogous > to the identification of a ViewPart to extract a particular view of > an object within Haystack (do I remember a SongPreview10Seconds View > Part, or something similar?). In this case the preview isn't meant > to be an aspect of the part however, but a name, which should be > interpreted by the part to extract the relevant information. > > Having trouble parsing this. > Perhaps I can clarify with an example. Consider a DVD archive that contains the theatrical release of "The Lord of the Rings". The URL for this sample asset for the sake of argument is 'http://simile.org/the-lord-of-the-rings-theatrical-release.dvd'. Now suppose I have created a DVD player that will read metadata describing any movie and use it to modify the way that movie is played back. For example, my DVD player can read metadata describing scenes that depict violence, and remove them during playback of the movie. Obviously the metadata read by the DVD player will have to include data that identifies the parts of the overal movie that represent the selected content. Using a URL to represent the content is insufficient -- we can't create new URL's for every possible subregion of a movie, and even if we did so, such an approach wouldn't help in finding an playing back parts of the movie that do not correspond to that URL. Naming, as is being described in section 3.2.7, has nothing to do with the URL for the asset. The purpose of naming is to create a linkage between the metadata and the movie subregion. Stepping out of our example, the purpose of Naming in this document is to represent other assets in ways that URLs cannot. Such linkages are neccessarily specific to the type of data being indexed so they cannot be generalized to a single technology, but that doesn't mean that we can't create a pattern around them. While using URLs with semantics is one option, an alternative way to specify a particular subpart of the movie is with a blob of RDF. eg, there is a resource foo (no semantics) and assertions "foo fragment-of the-lord-of-the-rings", "foo start-offset 300", and "foo end-offset 500". Whatever semantics I intend to place in the URL, I can instead, without any loss of expressive power, place in a blob of RDF statements. This leaves me with URLs containing no semantics at all, which has a consistency I like. The rest of that paragraph is a (probably lame) attempt to link this pattern to its structural equivalent in Haystack. > > To my mind, Semantic Web without the Web is just Semantic Filesystem. > > Perhaps, but nobody knows how to build a decent semantic filesystem. > While one might argue that google (analogue of what I said about > centralized scenario above) is "just filesystem", it is in fact a big > step forward because all the information it centralizes is interlinked > in an interesting way. > I would not argue that Google is 'just filesystem'. I think that Google's approach of gathering metadata and querying locally is a completely valid approach to solving the distribution problem for metadata. There may be other approaches that are also useful, especially for specific types of data such as invariate data, or strongly partitioned (federated) data. This is exactly the point I was making in previous email. While there are many ways to cope with metadata distribution, collect+local query is perhaps the simplest possible way, and reduces the problem to understanding how to do local query, which we still don't know how to do, so I am recommending we tackle that first.
Received on Tuesday, 8 April 2003 00:43:13 UTC