Re: FW: notes on use cases from David R. Karger on 2003-04-08 (www-rdf-dspace@w3.org from April 2003)

From: David R. Karger <karger@theory.lcs.mit.edu>
Date: Tue, 8 Apr 2003 00:47:29 -0400
To: ks@micky.hpl.hp.com
CC: Mark_Butler@hplb.hpl.hp.com, www-rdf-dspace@w3.org
Message-Id: <200304080447.h384lTvA006074@harrier.lcs.mit.edu>
   Date: Mon, 7 Apr 2003 08:47:07 -0700
   From: Kevin Smathers <ks@micky.hpl.hp.com>
   Cc: Mark_Butler@hplb.hpl.hp.com, www-rdf-dspace@w3.org
   Content-Disposition: inline
   X-SBClass: Nonlocal Origin [156.153.255.206]
   X-Spam-Status: No, hits=-4.4 required=5.0 tests=IN_REP_TO version=2.20
   X-Spam-Level: 
   X-SpamBouncer: 1.5 (2/23/03)
   X-SBPass: NoBounce
   X-SBClass: OK
   X-Folder: Bulk

   On Mon, Apr 07, 2003 at 02:10:59AM -0400, David R. Karger wrote:
   >    Randomness avoids collision?  I would say rather that federation
   >    avoids collision, randomness only invites it.  
   > 
   > Depends.  If URLs are random 128-bit integers, collisions are
   > hell-freezes-over unlikely.
   > 

   Assuming one has a mechanism for enforcing randomness, then I agree.
   The problem with MD5 sums is that the contents of the URL become
   immutably linked to the URL itself.  

   Invariant documents have lots of nice features; distribution, cache 
   control, and cache verification become trivial, but on the down side
   there is no consistent address for the top of tree of a document history.
   If you want to be able to modify your document after publishing its
   MD5-sum URL, then you will also have to accept that maintaining 
   randomness is cooperative.

I find it quite useful to distinguish between a particular bit blob
(which by definition cannot change: if it changes it is a different
blob) and a document (which at one time can be a certain bit blob and
at another time a different bit blob.  Bit blobs can benefit from MD5
URLs.  As I mentioned in previosu email, mutable things can't have MD5
URLs.  It's nice to be able to say "at this time this document had
these bits."  Or "this bit blob was replaced by this one"

   Cooperative randomness reduces the likelihood of collision from 
   hell-freezes-over unlikely, to logical-truth.

   [...]
   > 
   >    The URL
   >    remains the same, there just needs to be a way of interpreting additional
   >    constraints on the content within that URL.  I think this is analogous
   >    to the identification of a ViewPart to extract a particular view of
   >    an object within Haystack (do I remember a SongPreview10Seconds View
   >    Part, or something similar?).  In this case the preview isn't meant
   >    to be an aspect of the part however, but a name, which should be 
   >    interpreted by the part to extract the relevant information.
   > 
   > Having trouble parsing this.
   > 

   Perhaps I can clarify with an example.

   Consider a DVD archive that contains the theatrical release of "The Lord 
   of the Rings". The URL for this sample asset for the sake of argument 
   is 'http://simile.org/the-lord-of-the-rings-theatrical-release.dvd'.

   Now suppose I have created a DVD player that will read metadata 
   describing any movie and use it to modify the way that movie is played 
   back.  For example, my DVD player can read metadata describing scenes 
   that depict violence, and remove them during playback of the movie.

   Obviously the metadata read by the DVD player will have to include
   data that identifies the parts of the overal movie that represent the
   selected content.  Using a URL to represent the content is insufficient
   -- we can't create new URL's for every possible subregion of a movie, 
   and even if we did so, such an approach wouldn't help in finding an 
   playing back parts of the movie that do not correspond to that URL.

   Naming, as is being described in section 3.2.7, has nothing to do with
   the URL for the asset.   The purpose of naming is to create a linkage
   between the metadata and the movie subregion.

   Stepping out of our example, the purpose of Naming in this document
   is to represent other assets in ways that URLs cannot.  Such linkages
   are neccessarily specific to the type of data being indexed so they
   cannot be generalized to a single technology, but that doesn't mean 
   that we can't create a pattern around them.

While using URLs with semantics is one option, an alternative way to
specify a particular subpart of the movie is with a blob of RDF.  eg,
there is a resource foo (no semantics) and assertions
"foo fragment-of the-lord-of-the-rings", "foo start-offset 300", and
"foo end-offset 500".  Whatever semantics I intend to place in the URL, I
can instead, without any loss of expressive power, place in a blob of
RDF statements.  This leaves me with URLs containing no semantics at
all, which has a consistency I like.  

   The rest of that paragraph is a (probably lame) attempt to link this 
   pattern to its structural equivalent in Haystack.

   > 
   >    To my mind, Semantic Web without the Web is just Semantic Filesystem.
   > 
   > Perhaps, but nobody knows how to build a decent semantic filesystem.
   > While one might argue that google (analogue of what I said about
   > centralized scenario above) is "just filesystem", it is in fact a big
   > step forward because all the information it centralizes is interlinked
   > in an interesting way.
   > 

   I would not argue that Google is 'just filesystem'.

   I think that Google's approach of gathering metadata and querying 
   locally is a completely valid approach to solving the distribution 
   problem for metadata.  There may be other approaches that are 
   also useful, especially for specific types of data such as invariate 
   data, or strongly partitioned (federated) data.

This is exactly the point I was making in previous email.  While there
are many ways to cope with metadata distribution, collect+local query
is perhaps the simplest possible way, and reduces the problem to
understanding how to do local query, which we still don't know how to
do, so I am recommending we tackle that first.
Received on Tuesday, 8 April 2003 00:43:13 UTC