Re: FW: notes on use cases from David R. Karger on 2003-04-07 (www-rdf-dspace@w3.org from April 2003)

From: David R. Karger <karger@theory.lcs.mit.edu>
Date: Mon, 7 Apr 2003 02:10:59 -0400
To: ks@micky.hpl.hp.com
CC: Mark_Butler@hplb.hpl.hp.com, www-rdf-dspace@w3.org
Message-Id: <200304070610.h376AxhW001879@harrier.lcs.mit.edu>
   Date: Fri, 4 Apr 2003 15:22:22 -0800
   From: Kevin Smathers <ks@micky.hpl.hp.com>
   Cc: SIMILE public list <www-rdf-dspace@w3.org>
   Content-Disposition: inline
   X-SBClass: Nonlocal Origin [156.153.255.206]
   X-Spam-Status: No, hits=-4.4 required=5.0 tests=IN_REP_TO version=2.20
   X-Spam-Level: 
   X-SpamBouncer: 1.5 (2/23/03)
   X-SBPass: NoBounce
   X-SBClass: OK
   X-Folder: Bulk

   Hi David,

   > -----Original Message-----
   > From: David R. Karger [mailto:karger@theory.lcs.mit.edu]
   > Sent: 04 April 2003 06:09
   > To: mick.bass@hp.com; Mark_Butler@hplb.hpl.hp.com
   > Subject: notes on use cases
   [...]
   > 3.2.7
   > 
   > Two issues seem scrambled here.  One is "how are things named" and the
   > obvious answer is "URIs".  A separate one is "Should there be
   > canonical URIs that can be deduced from what you are looking for"?  I
   > believe the answer to the second is no.  URIs should be opaque (for
   > example, random to avoid collisions).  The process of "figuring out
   > the right URI for something" is a type of search/retrieval problem.
   > Instead of squeezing this search/retreival into a specialized "figure
   > out the URL" task, incorporate it in the standard search framework.
   > Any information used to define a canonical URL can instead be used as
   > metadata on the object, and any knowledge of how to construct the URL
   > can then be turned into a specification of its metadata.
   > 

   Randomness avoids collision?  I would say rather that federation
   avoids collision, randomness only invites it.  

Depends.  If URLs are random 128-bit integers, collisions are
hell-freezes-over unlikely.

   In any case, I wasn't
   trying to describe a system of URL naming which incorporates a 
   query into the name, eg:

      http://google.com/search?q=switch+stoned+chicks&btnI=FeelingLucky
      http://www.apple.com/switch/ads/ellenfeiss.html

   Rather I intended the description of naming to read as a system for 
   identifying resources smaller than the atomic document.   

This is arguably convenient, in that it permanently binds the smaller
object to its containing object, giving you the semantics that if you
are looking for the smaller object it is a good subgoal to look for
the containing object.

But what if the contained object is inside two distinct objects?
Which URL is right?  What if someone doesn't know the object is
contained?  They will give it a third URL.

I will put in a plug for my favorite URL when possible, namely an MD5
hash of the object.  This is possible whenever the object is its
bits---eg a document, an email address, a (reified) RDF triple, but
not a person or a dynamic web site.  The nice things about MD5 urls is
that they provide a canonical naming rule that reduces the odds of
getting multiple names for the same object, reducing need for
inference about equivalence.

   The URL
   remains the same, there just needs to be a way of interpreting additional
   constraints on the content within that URL.  I think this is analogous
   to the identification of a ViewPart to extract a particular view of
   an object within Haystack (do I remember a SongPreview10Seconds View
   Part, or something similar?).  In this case the preview isn't meant
   to be an aspect of the part however, but a name, which should be 
   interpreted by the part to extract the relevant information.

Having trouble parsing this.

   > 3.4
   > 
   > Distributed resources adds whole different scope.  It opens up a host
   > of nasty problems of course.  We could avoid them by limiting our
   > dealing of with distributed metadata to devising a simple
   > block-transfer protocol, getting all the metadata to a single
   > location, and dealing with it there.  The metadata might not be fully
   > up to date, but it avoids a lot of trouble.  Since even in centralized
   > scenario everything is hard, perhaps we defer distributed search?
   > 

   To my mind, Semantic Web without the Web is just Semantic Filesystem.

Perhaps, but nobody knows how to build a decent semantic filesystem.
While one might argue that google (analogue of what I said about
centralized scenario above) is "just filesystem", it is in fact a big
step forward because all the information it centralizes is interlinked
in an interesting way.

d
Received on Monday, 7 April 2003 02:07:05 UTC