- From: Aaron Swartz <me@aaronsw.com>
- Date: Sun, 07 Apr 2002 12:59:02 -0500
- To: Sandro Hawke <sandro@w3.org>
- CC: RDF-Interest <www-rdf-interest@w3.org>
On 2002-04-07 12:19 PM, "Sandro Hawke" <sandro@w3.org> wrote: > Interesting. I've been thinking I need to write a "Why Hash? (Why Use > URI-References as RDF Identifiers)" paper. I'll try out the argument > now. I'm happy to skewer it. ;-) > I think any string which wont be accidentally reused makes a decent > universal identifier. UUIDs/GUIDs/tags, are fine for this. > > Unfortunately, they don't help us locate any information about the > things identified. Huh? TAGs give us an email address or a domain name, sure those things are useful for locating information. Even if they are not, there are more general solutions discussed below. > It would be very nice to use RDF identifiers kind > of like web address: you see one on the side of a bus, you type it in, > and you get some interesting information. For this to work with > UUIDs, we'd need something like google in the background. Seems like > a bad idea. I assume that by Google you mean some sort of centralized system. This simply isn't true. There are a number of decentralized hash table (DHTs) systems which make the UUID->content mapping easy with no centralization. Many people are working on such systems and new research is making them better and better with each passing week. There are also IETF systems like RESCAP that allow for resolution of URNs and other such systems. The assumption that we need to tie ourselves to centralized systems for secure naming is absurd. As Zooko's Law[1] states: "Names: Decentralized, Secure, Human-Memorizable: Choose Two". For decentralized and secure systems we have a whole series of tools at our disposal (DHTs, cryptographic hashes, digital signatures, etc.) and for human-readable and secure names we have tools that allow anyone two people have in common to be a centralization point (Pet Names[2], Google, DNS). [1] http://zooko.com/distnames.html [2] http://www.erights.org/elib/capability/pnml.html The reason URIs have always appealed to me is that they allow one to choose any of these trade-offs while staying within a well-known system. While DNS-based URIs are currently popular, I expect that with the advent of systems like DHTs and Pet Names that their popularity will fall off. > The URI-Reference approach (which I've adopted, after flirting with > tag URIs) is to use URI-References as object identifiers, and URIs as > knowledge-base identifiers. The problem with this as Roy Fielding, Uche Ogbuji and I point out is that you're tying your identifiers to the specifics of a serialization syntax and closing yourself off from all the tools the Web has for systems (redirects, content-negotiation, access-control, 404s). HTTP URIs provide a useful system of hierarchical delegation. URI-References put you at the mercy of whoever has posted the latest MIME type draft for your serialization syntax. With HTTP URIs I can tell if the author has created the URI or not (do I get a 404?) but with URI-references whomever wrote the fragment spec can create all the URI-refs and meanings for tham that they want. > A nearby approach, which I don't like, is to use URIs to denote > everything. With this plan, the owner has the same ability to publish > easily-found information, but the whole system seems more confusing. Confusing is in the eye of the beholder. There are a zillion confusing corner-cases with fragments, but I can see only one with full URIs (one that recent Internet-Drafts are attempting to solve). Some of the corner-cases: - What happens when (like with XPointer) someone retroactively adds new fragments to your document? If someone links to your 9th <p> tag, must you always have 9 <p> tags in the document? - What happens when the document changes? Does the meaning of the URI-ref change? - What happens when you get back (via con-neg) a serialization syntax in which the fragment is illegal? [Think of getting back HTML where fragments with numbers are illegal, or an audio file where they are required.] - If the fragment is tied to a series of bits (like a section of an audio file) does the meaning of the fragment change when the bits change? (i.e. Are fragments late-bound?) Fragments are cloaked in mystery and differing interpretations of myriad specs, this is not a sound way to build a global identifier space. > Now we're back to wondering what exactly http://www.w3.org/ denotes. The recent Internet-Draft proposing Repr-Type and Resource-Type headers makes this clear. If you get a Resource-Type: http://xmlns.com/foaf/0.1/Person header, then it's a person. Alternately, you can just ask the W3C or stick with "safe" URI systems like UUIDs, GUIDs, TAGs or URNs. > With the previous plan it's clear: it denote a collection of > information (published by the W3C, probably about the W3C and other > things). "A collection of information" doesn't seem particularly clear to me... > If you use URIs for everything, you're essentially running a > great risk of accidental identifier re-use. I think this is hardly as serious as the accidental-re-use problems with fragments. What happens if I point to a section of an MP3 file where TimBL describes "test of independent invention" and then someone adds an introduction to the MP3 bumping my selected fragment-space to a bunch of applause? I think you can see the problems now, but if you can't I'm happy to discuss it further. All the best, -- "Aaron Swartz" | The Semantic Web <mailto:me@aaronsw.com> | <http://logicerror.com/semanticWeb-long> <http://www.aaronsw.com/> | i'm working to make it happen
Received on Sunday, 7 April 2002 13:59:06 UTC